Hints About HTML Formatting

Copyright 1997-2002, 2007 Ronald B. Standler

Introduction creating HTML documents validating HTML Indenting Text Color Section Headings Links link checkers robots.txt file Dashes and Greek Letters LINK REL= ___ command

Introduction

Text on the Internet is normally formatted in Hypertext Markup Language (HTML), which is easy to learn.

Of the various books that I have seen on HTML applications, the best is HTML The Definitive Guide by Chuck Musciano and Bill Kennedy, which is published by O'Reilly & Associates.

There are abundant sources of information on style in HTML documents on the web, for example: I like Warren Steel's conservative views at Hints for Web Authors.

Of course, the language specification for HTML 3.2 is posted on the Internet! I have also found the list of all HTML 3.2 commands helpful.

The official HTML is rapidly evolving, but I prefer to use version 3.2 of HTML, which was state-of-the-art in early 1996. Most users of the Internet do not have the latest versions of browsers installed on their computer, so they are not able to see all of the new features in the current version of HTML. Similarly, both Netscape's and Microsoft's browsers support proprietary extensions to HTML, but I choose to avoid most of these extensions, as they can be seen neither by users of the other company's browser nor by users of older browsers. There is an active campaign to avoid browser-specific commands: Best Viewed With Any Browser and Straub's interoperable web page design.

I originally wrote this webpage in July 1997, when Netscape was the dominant webbrowser. Since then I have added more suggestions, but not thoroughly revised the old text.

Creating HTML Documents

For documents currently in a wordprocessor format, there are several easy ways to convert the document to HTML format:

Current versions of WordPerfect allow one to export a document in HTML format.
Use conversion utilities to convert from a wordprocessor format to HTML.
Use a wordprocessor to export an ASCII file, which can then be edited in an HTML editor or in a text editor.

Using an HTML editor (e.g., Adobe PageMill) makes it easy for beginners to set colors and to insert anchors (i.e., links within a document, as in an index or table of contents). After I prepared about six webpages with an HTML editor, I preferred to write HTML code with a text editor, using cut-and-paste from my earlier HTML documents.

Don't forget that the Internet is an international place. For example, avoid dates in the 2/7/97 format, instead use at least three letters of the month, as in 2 July 1997. When I post an essay on law, I try to remember to say "law in the USA" or mention "U.S" in front of "Constitution" or "Supreme Court", because it is possible that the reader is from another country.

Validating HTML Code

HTML documents that are posted on the Internet can be proofread for compliance with the HTML specification, by using a validator. My favorite was developed by G. Oskoboiny at the University of Alberta and is now maintained by the W³ Consortium: W3C HTML Validation Service

Indenting Text

One of the most frustrating features of HTML is the small collection of commands for formatting text. One can indent text by using the following sequence of commands:
<DL>

<DD> indented text goes here, with a command at the end of the last line of indented text </DL>

The text that follows /DL is set at the left edge.

If the indented text is a quotation with a citation, then one can use the following sequence of commands:
<DL>

<DD>quoted text goes here
<DT>citation goes here. </DL>

The text that follows the /DL command is set at the left edge.

<DL>

<DT>text goes here: <DD>indented text goes here<BR>
<BR>; <DD>another block of indented text goes here
<DT>text goes here: <DD>indented text goes here, followed by </DL>

The text that follows the /DL command is set at the left edge.

Here is another way to get indented text in some web browsers.

<BLOCKQUOTE>This is a block quotation of nonrandom keyboard sequences qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty for this test of the block quotation feature in HTML. </BLOCKQUOTE>

The text that follows /BLOCKQUOTE appears here.

A simple way to indent the first line of a paragraph is to insert at the beginning of the first line   a nonbreaking space that is both preceded and followed by ordinary spaces, which causes the browser to skip three spaces. If one wants to indent farther, simply type:<BR>
      text goes here<BR>
Because the monospace blank is wider than the proportionally spaced blank, one can indent farther by including the <TT> tag around the nonbreaking spaces:
<TT>       </TT> text goes here<BR>

I like to organize a complicated thought as a list. One can use HTML to obtain an indented list with bullets at the left of each entry by using:
<UL>

<LI> indented text goes here. Each item in the list can have a length of many lines, if necessary. There is no command at the end of each item in the list, except at the end of the last line of the last item.
<LI> next item goes here
<LI> another item can go here
<LI> last item in the list is here. </UL>

The text that follows /UL is set at the left edge.

One can use HTML to obtain an indented list with numbers at the left of each entry by using the following commands. One can get a variety of different numbering schemes by using the <OL type=#> command, where # is the character for the first item in the list, chosen from the following: a, A, I, i, 1. If you omit the "type=#", then HTML assumes # = 1.
<OL>

<LI> indented text goes here. Each item in the list can have a length of many lines, if necessary. There is no command at the end of each item in the numbered list.
<LI> next item goes here
<LI> another item can go here
<LI> last item in the list is here. </OL>

The text that follows the /OL is set at the left edge.

Color

The color of text can be set with the <FONT COLOR="#rrggbb"> command, where rr is a two hexadecimal digits (base 16) code for red, gg a two digit code for green, and bb a two digit code for blue. Values range from zero (00) to full (FF). Computers in 1990s often used graphics cards that supported only 256 color values, and for those now antique cards, values of either FF, CC, 99, 66, 33, or 00 were preferred for each color. For more information, see the palettes at:

Lynda Weinman's color charts for browsers by value or hue
U. Tex. at Austin color charts
Color Schemer program for displaying colors
VisiBone click on color to get hex value
Peter Forret's program to display specified colors
search Google for the query "color charts HTML"

I have made a conservative choice for colors of background and links in my documents:
<BODY BGCOLOR="#DDDDDD" TEXT="#000000" LINK="#0000FF" VLINK="#990000" ALINK="#FF0000">
which uses a brighter gray background than the Netscape 3.0 default. The user may also change the color in the browser (e.g., in Netscape pull down the Edit menu, then select Preferences, then Appearance, followed by Colors). If the user selects green text and you specify a BGCOLOR that is green, then your text will be invisible to that user! If the user selects a yellow background and you specify yellow text, then your text will be invisible to that user! Users who specify colors in their Netscape browsers should consider also selecting "always use my colors, overriding document", so there is a consistent set of contrasting choices. Authors who specify colors with the <BODY> command should specify all five items, so there is a consistent set of contrasting choices.

Authors of HTML pages might consider avoiding the <FONT COLOR="#rrggbb"> command, because the author's choice of color for text could be invisible against the background color specified in the user's browser. Instead, emphasis can be obtained with the

<B> command for bold,
<I> command for italics, and
<BIG> command for one size larger text.

On the other hand, use of color gives spice to a document that makes it more pleasing to read. Using dark colors (e.g., black = 000000, blue = 0000CC, or red = 990000) is unlikely to conflict with background colors chosen by a user, which tend to be bright. Example bright background colors include:

grey = #DDDDDD (my usual choice)

white = #FFFFFF

pink = #FFDDDD

green = #CCFFCC

yellow = #FFFFCC

cyan = #CCFFFF

magenta = #FFDDFF

purple = #EEDDFF

Do not use my names for colors in HTML code, instead use the three hexadecimal digits: e.g.,
<BODY BGCOLOR="#FFDDDD" >
for pink.

I like Warren Steel's conservative comments about the FONT command.

Section Headings

There are two ways to make a heading. The hard way:
<CENTER><FONT SIZE ="+2">heading</FONT></CENTER> and the easy way:
<H2 ALIGN=CENTER>

another heading

</H2>
Not only is the second way easier, but some search engines allegedly look for words in <H#> to collect for indexing. However, each <H#> command acts as if it had two gratuitous <BR> commands at the end.

<H4> corresponds to normal size text </H4>

<H3> corresponds to FONT SIZE="+1" text </H3>

<H2> corresponds to FONT SIZE="+2" text </H2>

Links

Hypertext links are what turns the Internet from a large bulletin board into a useful resource that includes search engines. Therefore, it is critical that people who post pages on the Internet carefully choose the URL of their documents (including file names for each document), so that links to these URLs will be stable.

Two professors at the University of Nebraska at Lincoln developed three online biochemistry classes, with 515 links. When they found that they were spending about four hours/month checking and revising links on their pages, they did some research and found that the half-life of their links was only 58 months in mid-2002. Here is the current version of their Report on link rot.

Sometimes, it will be necessary to change a URL. When that happens it is good etiquette, as well as necessary for stability of links, to replace the old document with a short document that refers the reader to the new URL. For example, in August 1998, I moved my essays on law and technology from a website at CompuServe to my own domain name (www.rbs2.com/). Here is an example of how I redirected a reader who requested my essay on the response of law to new technology, file name = lt.htm
At the old URL, ourworld.compuserve.com/homepages/rstandler/lt.htm
I posted a brief document that contained the following code:

<HTML> <HEAD> <META NAME="ROBOTS" CONTENT="noindex, follow"> <META NAME="AUTHOR" CONTENT="Dr. Ronald B. Standler"> <META NAME="DESCRIPTION" CONTENT="this page has moved to www.rbs2.com/lt.htm"> <META HTTP-EQUIV="REFRESH" CONTENT="1; URL=http://www.rbs2.com/lt.htm"> <TITLE>new URL for this document</TITLE> </HEAD> <BODY> <BIG> The document that you have requested is now located at<TT> <A HREF="http://www.rbs2.com/lt.htm">www.rbs2.com/lt.htm</A></TT></BIG><BR> <BR> Please wait while you are shuffled around the Internet to the current location of the document.<BR> <BR> </BODY> </HTML>

Notes:

The META ROBOTS noindex command tries to prevent search engines from indexing the page at the old URL. The follow command tells search engines to follow the link(s) on the webpage, i.e., to link to the new location of the essay.
The META DESCRIPTION and TITLE lines are included in case a search engine indexes this page.
The META HTTP-EQUIV line automatically transfers the reader to the new URL.
The text in the BODY of the page gives the user something to read while his/her web browser is being redirected to the new URL. As a redundant precaution, and also to give search engine robots something to index, I also included a link to the new URL in the BODY of the page

The referring page at the old URL should remain on the Internet for at least six months, preferably at least a year, so that all of the search engines that have indexed the old URL will have done a routine crawl of their database and discovered the URL has changed. Also, the new URL needs to be added to search engines in the usual way. In the meantime, the referring page at the old URL redirects users who have found a link to the old URL.

I deleted all of my documents from CompuServe at the end of February 2001, 30 months after I established my professional website. However, you can find a current copy of a referring page similar to the above code at www.rbs2.com/privacy.html.

Some search engines (e.g., AltaVista, Google, AllTheWeb) can be used to find pages that contain a link to a specific URL. In this way, one can find links that need to be changed. One hopes that the author of each link has put his/her e-mail address at the bottom of the page, or at the bottom of his/her homepage, so they can be notified of the changed URL.

Link Checkers

Approaching the same problem from the other direction, there are services that will automatically check all of the links on your page and report dead links to you:

the `robots.txt` file

In the old days of the internet, one submitted each file name to a search engine for listing. Modern practice is to submit only the domain name (e.g., http://www.sitename.com/ ) to the search engine, which will then send out a "robot" or "spider" to crawl through all of the webpages at that site and bring back data on these pages to list in the search engine. Once a website is listed in a search engine's database, robots/spiders from that search engine will periodically crawl the website, to update the database. However, robots/spiders only descend through a series of links, starting at the website's homepage, and will not find documents that are not linked at the website.

To attempt to prevent a search engine from indexing a particular HTML document, one can include the following line in the header of the HTML file:
<META NAME="ROBOTS" CONTENT="noindex">
However, not all robots/spiders obey this HTML command. Another way to prevent robots/spiders from indexing a webpage is to specify the name of that webpage in a robots.txt file.

How often do robots/spiders visit a website? My professional website (www.rbs2.com/), which has been on the Internet since July 1998 and is included in all of the major search engines, has a robots.txt file that had an average of 30 hits/day during June 2001. During February 2005, my robots.txt file had an average of 78 hits/day, when my whole website had 2030 hits/day. Clearly, a robots.txt file is worthwhile if one wants to exclude some webpages from search engines.

Before I describe the robots.txt file, it is worthwhile to discuss why one might want to post a webpage that is hidden from search engines. For example,

Webpages that are content-free do not belong in a search engines' databases. For example: All of the file names at my websites end in .htm, however some visitors type an .html extension or follow a link with the wrong extension. To allow these visitors to see the page that they seek (e.g., filename.htm), I posted some pages (e.g., filename.html) that redirect visitors who have used the wrong extension. It would be a waste of search engine database space to include filename.html in a search engine, since that file is content-free and only serves to direct a reader to the correct file.
Pages listed in search engines should be reasons why a person would want to go to a website. Other pages do not belong in search engines' databases. For example:
- Terms of service (i.e., license.htm) and disclaimers (i.e., disclaim.htm). These pages are important only to visitors at a website, not a reason to visit a website.
- What's new at my professional website, www.rbs2.com/new.htm, a webpage that I posted for repeat visitors to my professional website, to tell them which essays I have revised since their last visit. There is no reason to list new.htm in search engines.
- My webpage that lists my professional fees. I post that page for potential clients who have already decided that I may be an appropriate attorney or consultant for them. I don't want to assist my competitors who are searching the internet for terms like "legal fees" or "consulting fees".
Files that the webmaster wants to hide from the public, for example:
- The analysis of logfiles that shows the number of hits on each webpage and the referring sites for visitors (i.e., the URL of the page that contained the link that the visitor clicked to arrive at my webpage).
- A draft webpage that is being tested with an HTML validator and shared with a few colleagues for their criticism, prior to public posting.

example of `robots.txt` file

Put the following file in the root directory (i.e., the directory that contains index.htm or homepage.htm) of a website:

User-agent: *
Disallow: /disclaim.htm
Disallow: /license.htm

This example User-agent command specifies that the following lines apply to all robots/spiders. Each Disallow command excludes from the robot's/spider's collection the one named file that is located in the root directory.

resources for `robots.txt` files

Users who are interested in more complicated ways of using a robots.txt file (e.g., websites that have subdirectories) should use a search engine to search for "robots.txt" and browse through various tutorials.

After uploading a robots.txt file to the root directory of a website, that file should be tested. I have found the following validators helpful:

software developer in Italy.
UK Office for Library and Information Networking at the University of Bath.
Simon Wilkinson.

Finally, test the robots.txt file by sending the robot from the link checking program to the root directory (e.g., http://www.sitename.com/) without specifying a file name. The results of the link checker should show the disallowed files that are linked on the homepage as "access denied for robots", or similar nomenclature.

Dashes and Greek Letters

For many years, the en-dash and em-dash (so-called because the dashes are with width of the characters n and m, respectively) were represented in HTML by the codes:
 

Towards the end of the 2001 year, the HTML validator at the W3C site began declaring those codes invalid, because the proper SGML codes are:
– —
– —

The official table of SGML characters is posted at the W3C website.

Greek Letters

The old way to display Greek Letters in HTML is to use the Symbol font command:
<FONT FACE="Symbol"> l p </FONT>
l p
A table of the Greek letters is given by Prof. Lovelock at the University of Arizona.
The above-mentioned way of displaying Greek letters does not work in Firefox or Google Chrome on an Apple computer in the year 2010.

Another way to include Greek letters in HTML documents is to use HTML 4 characters:
alpha beta gamma delta epsilon Delta
α β γ δ ε Δ
α β γ δ ε Δ
A list of HTML4 character codes is given in a table by Prof. Barzilai at Salisbury University in Maryland.

One can also use Unicode:
alpha beta gamma delta epsilon Delta
α β γ δ ε Δ
α β γ δ ε Δ
Unicode charts of Greek letters, math symbols. Menu of all Unicode charts.

Helpful table of Unicode in HTML documents. (choose option for hexadecimal numerical HTML encoding of the Unicode character). Greek letters begin on page 4 (start=768). Tables by Tomas Schild of Tübingen Germany.

<LINK REL = ___> command

The HTML specification includes a rarely-used command that permits a webbrowser to link to related documents (e.g., previous or next page in a series of webpages, homepage, copyright notice for website, etc.) or to send e-mail to the author of the website. One reason that this command is little used is that most webbrowsers before the year 2002 did not support this command.

Examples of how to use this command:
<HEAD> <LINK REV="made" HREF="mailto:webmaster@xyz.com"> <LINK REL="home" HREF="index.htm" TITLE="Homepage"> <LINK REL="copyright" HREF="tos.htm" TITLE="Terms of Service"> </HEAD>
There are some "official" values of the LINK REL="___" phrase that correspond to symbols shown in some webbrowsers. For a list of the official values used by the i-Cab browser, one of the first webbrowsers to implement this command, see the FAQ under "I want to add some LINK tags ...."

For more information on this command, type the following into a search engine:
"LINK REL" made index home copyright
and follow the results.

http://www.rbs0.com/hints.htm
created 14 July 1997, text revised 28 July 2013, links updated 1 June 2014

return to my personal homepage