Tag Archives: HTML

From Word to HTML (Part 2): Simple HTML formatting in Microsoft Word

In my last post, I described some of the benefits of using Microsoft Word to prepare documents for web publication. I recommended the HTML Goodies website, particularly the original seven-lesson primer by Joe Burns and “Basic HTML That Everyone Should Know” by Michael Rohde.

Once you’ve learned some basic HTML, here are some simple search-and-replace tasks you can do in Word:

First make a new Word file. I usually save it with a name such as [article]4HTML.doc.

Add hyperlinks (see the HTML primers for instructions).

Add tags for any <i>italics</i> and <b>bold</b> type.

If you have text that isn’t a regular paragraph—such as block indention or bulleted or numbered lists—add the HTML tags for those. (Use <blockquote> and </blockquote> to begin and end block indention; see the HTML primers for creating bulleted or numbered lists—it’s easy.)

Then add <p> at the beginning of each of the regular paragraphs and </p> at the end. You can copy and paste these or use the repeat key (F4) to speed this up.

If you have only regular paragraphs, you can add the tags all at once. Make sure you don’t have any extra paragraph returns between paragraphs, then replace ^p (paragraph return) with </p>^p<p> (or </p>^p^p<p> if you want space between paragraphs when you view the code—it won’t affect the browser view). Add <p> at the beginning of the first paragraph and </p> at the end of the last one, and you’ll have all your paragraph tags in place.

Then make sure that all the single and double quotation marks are curved. You can do this by searching and replacing them all. First, check your Autocorrect settings. Under Autoformat as you type, Autocorrect, Replace as you type, check the box for “Straight quotes” with “smart quotes.” After you replace them all, make sure that none of them came out backwards.

Then uncheck that box, because when you enter any HTML tags that involve quote marks, you need straight, not curved, quote marks.

Now you are ready to replace the fancy punctuation marks with HTML “ampersand” characters—so called because they all start with an ampersand (&). They use the same numbers as the Windows keyboard shortcuts; for example, an em dash in Windows is alt+0151. The HTML ampersand character is .

Use Word’s find and replace tool to change all these characters in the document to their HTML equivalents (type these in the find and replace boxes: for example, hold down the alt key and type 0147 in the find box, and type in the replace box). Choose Replace all:

alt+0147 to (open double quote mark)
alt+0148 to (close double quote mark)
alt+0145 to (single open quote mark)
alt+0146 to (single close quote mark [apostrophe])
alt+0150 to (en dash)
alt+0151 to (em dash)
alt+0133 to (ellipsis points)

That covers the common special punctuation marks (those that aren’t on your keyboard).

Then save the file as plain text, change the filename extension from txt to htm, and open it and see what you have. Some things will probably come out wrong, and that’s OK. Now is when it’s easy to fix. You won’t open your plain text file and find 87 font commands around a picture. If you find that half the article is italic, you know that you entered <i> at one point and forgot to add </i> where you want the italics to stop. Fix it, save it, refresh the browser view of the web page, and check it again.

As you learn more HTML, you can do additional formatting in Word: tables, headings, anchors and page jumps (similar to bookmarks in Word), “no break” tags to keep words together on the screen, and more. If you’re editing for the web, I think you’ll find it much quicker and easier to take care of simple HTML formatting yourself than to be giving your webmaster a list of things such as broken hyperlinks that need to be fixed by someone else and then checked again by you.


From Word to HTML

Do you need to prepare Microsoft Word documents for web publication?

If you are creating web pages, you can use a web layout program such as Dreamweaver or Frontpage. Even if you are merely preparing text for the web, you can use one of these programs to generate the hypertext markup language (HTML) code so that it’s ready to be placed within a web layout, often via a content management system.

However, When someone “copies text from Word and then pastes it into a [content management system], a ton of proprietary [Microsoft] code is copied along with it,” wrote Michael Rohde in the HTML primer “Basic HTML That Everyone Should Know” on the HTML Goodies website. Content management systems “all treat HTML the same and learning a little HTML now can help,” wrote Rohde.

Also, now and then something will go sour in the code, and it helps if you can fix the code yourself. For this, too, you need a working knowledge of HTML, and a good starting place to acquire this knowledge is HTML Goodies. The free resources—primers and tutorials—get you started, and the original seven-lesson primer by Joe Burns is both easy to understand and entertaining.

Basic HTML coding is easy. Before posting an entry on this blog or on a content management systems at work, I put it into HTML and make a web page that I can look at and make sure that the text formatting, such as italics, is correct; that all hyperlinks work; and that typographical characters, such as curved quotation marks, are correctly rendered. You can fix these things with a web layout program, but sometimes it will make a real hash of things out of sight. I opened the code in one web file to fix something and found 87 font commands surrounding a picture. It’s far better to provide clean HTML for web publication. You can do a lot of the work in Microsoft Word, and formatting for a simple file can be done in a few minutes. (I don’t recommend using Word to make web pages; it creates bloated files that are hard to edit if you need to, say, fix a web address.)

In the next post, I’ll describe some easy HTML formatting you can do in Word before you change it to HTML. Meanwhile, if you’re new to HTML, study the seven-lesson primer on HTML Goodies and come back in about a week. (Don’t be afraid of HTML! When my manager first told me, in 2001, that I should learn it, I was reluctant. Then I tried it, and I liked it. Compared to Word, which can yield unexpected bizarre results, HTML is logical and generally predictable.)