In my last post, I described some of the benefits of using Microsoft Word to prepare documents for web publication. I recommended the HTML Goodies website, particularly the original seven-lesson primer by Joe Burns and “Basic HTML That Everyone Should Know” by Michael Rohde.
Once you’ve learned some basic HTML, here are some simple search-and-replace tasks you can do in Word:
First make a new Word file. I usually save it with a name such as [article]4HTML.doc.
Add hyperlinks (see the HTML primers for instructions).
Add tags for any <i>italics</i> and <b>bold</b> type.
If you have text that isn’t a regular paragraph—such as block indention or bulleted or numbered lists—add the HTML tags for those. (Use <blockquote> and </blockquote> to begin and end block indention; see the HTML primers for creating bulleted or numbered lists—it’s easy.)
Then add <p> at the beginning of each of the regular paragraphs and </p> at the end. You can copy and paste these or use the repeat key (F4) to speed this up.
If you have only regular paragraphs, you can add the tags all at once. Make sure you don’t have any extra paragraph returns between paragraphs, then replace ^p (paragraph return) with </p>^p<p> (or </p>^p^p<p> if you want space between paragraphs when you view the code—it won’t affect the browser view). Add <p> at the beginning of the first paragraph and </p> at the end of the last one, and you’ll have all your paragraph tags in place.
Then make sure that all the single and double quotation marks are curved. You can do this by searching and replacing them all. First, check your Autocorrect settings. Under Autoformat as you type, Autocorrect, Replace as you type, check the box for “Straight quotes” with “smart quotes.” After you replace them all, make sure that none of them came out backwards.
Then uncheck that box, because when you enter any HTML tags that involve quote marks, you need straight, not curved, quote marks.
Now you are ready to replace the fancy punctuation marks with HTML “ampersand” characters—so called because they all start with an ampersand (&). They use the same numbers as the Windows keyboard shortcuts; for example, an em dash in Windows is alt+0151. The HTML ampersand character is .
Use Word’s find and replace tool to change all these characters in the document to their HTML equivalents (type these in the find and replace boxes: for example, hold down the alt key and type 0147 in the find box, and type in the replace box). Choose Replace all:
- alt+0147 to (open double quote mark)
- alt+0148 to (close double quote mark)
- alt+0145 to (single open quote mark)
- alt+0146 to (single close quote mark [apostrophe])
- alt+0150 to (en dash)
- alt+0151 to (em dash)
- alt+0133 to (ellipsis points)
That covers the common special punctuation marks (those that aren’t on your keyboard).
Then save the file as plain text, change the filename extension from txt to htm, and open it and see what you have. Some things will probably come out wrong, and that’s OK. Now is when it’s easy to fix. You won’t open your plain text file and find 87 font commands around a picture. If you find that half the article is italic, you know that you entered <i> at one point and forgot to add </i> where you want the italics to stop. Fix it, save it, refresh the browser view of the web page, and check it again.
As you learn more HTML, you can do additional formatting in Word: tables, headings, anchors and page jumps (similar to bookmarks in Word), “no break” tags to keep words together on the screen, and more. If you’re editing for the web, I think you’ll find it much quicker and easier to take care of simple HTML formatting yourself than to be giving your webmaster a list of things such as broken hyperlinks that need to be fixed by someone else and then checked again by you.