Hints About HTML Formatting
Copyright 1997-2002, 2007 Ronald B. Standler
Table of Contents
Introduction
creating HTML documents
validating HTML
Indenting Text
Color
Section Headings
Links
link checkers
robots.txt file
Dashes and Greek Letters
LINK REL= ___ command
Introduction
Text on the Internet is normally formatted in Hypertext Markup
Language (HTML), which is easy to learn.
Of the various books that I have seen on HTML applications, the best
is HTML The Definitive Guide by Chuck Musciano and Bill Kennedy,
which is published by O'Reilly & Associates.
There are abundant sources of information on style in HTML documents on the
web, for example: I like Warren Steel's conservative views at
Hints for Web Authors.
Of course, the language specification for
HTML 3.2
is posted on the Internet! I have also found the list of all
HTML 3.2 commands helpful.
The official HTML is rapidly evolving, but I prefer to use version 3.2 of HTML,
which was state-of-the-art in early 1996. Most users of the Internet
do not have the latest versions of browsers
installed on their computer, so they are not able to see all of the new features
in the current version of HTML. Similarly, both Netscape's and Microsoft's
browsers support proprietary extensions to HTML, but I choose to avoid most
of these extensions, as they can be seen neither by users of the
other company's browser nor by users of older browsers.
There is an active campaign to avoid browser-specific commands:
Best Viewed With Any Browser
and
Straub's interoperable web page
design.
I originally wrote this webpage in July 1997, when Netscape was the dominant
webbrowser. Since then I have added more suggestions, but not thoroughly revised
the old text.
Creating HTML Documents
For documents currently in a wordprocessor format, there are several
easy ways to convert the document to HTML format:
- Current versions of WordPerfect allow one to export a document in HTML format.
- Use conversion utilities to convert from a wordprocessor format to HTML.
- Use a wordprocessor to export an ASCII file, which can then be edited in
an HTML editor or in a text editor.
Using an HTML editor (e.g., Adobe PageMill) makes it easy for beginners
to set colors and to insert anchors (i.e., links within a document,
as in an index or table of contents).
After I prepared about six webpages with an HTML editor,
I preferred to write HTML code with a text editor,
using cut-and-paste from my earlier HTML documents.
Don't forget that the Internet is an international place. For example,
avoid dates in the 2/7/97 format, instead use at least three letters of the
month, as in 2 July 1997. When I post an essay on law, I try to
remember to say "law in the USA" or mention "U.S" in front of "Constitution"
or "Supreme Court", because it is possible that the reader is from
another country.
Validating HTML Code
HTML documents that are posted on the Internet can be proofread for compliance
with the HTML specification, by using a validator. My favorite
was developed by G. Oskoboiny at the University of Alberta and is now
maintained by the W3 Consortium:
W3C HTML Validation Service
Indenting Text
One of the most frustrating features of HTML is the small collection of
commands for formatting text. One can indent text by using the
following sequence of commands:
<DL>
- <DD> indented text goes here, with a command at the end
of the last line of indented text </DL>
The text that follows /DL is set at the left edge.
If the indented text is a quotation with a citation, then one can use the
following sequence of commands:
<DL>
- <DD>quoted text goes here
- <DT>citation goes here. </DL>
The text that follows the /DL command is set at the left edge.
<DL>
- <DT>text goes here
- <DD>indented text goes here<BR>
<BR>
- <DD>another block of indented text goes here
- <DT>text goes here
- <DD>indented text goes here, followed by </DL>
The text that follows the /DL command is set at the left edge.
Here is another way to get indented text in some web browsers.
<BLOCKQUOTE>This is a block quotation of nonrandom keyboard sequences
qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty
qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty
qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty qwerty
for this test of the block quotation feature in HTML. </BLOCKQUOTE>
The text that follows /BLOCKQUOTE appears here.
A simple way to indent the first line of a paragraph is to insert
at the beginning of the first line
a nonbreaking space that is both preceded and followed
by ordinary spaces, which causes the browser to skip three spaces.
If one wants to indent farther, simply type:<BR>
text goes here<BR>
Because the monospace blank is wider than the proportionally spaced blank,
one can indent farther by including the <TT> tag around the nonbreaking spaces:
<TT> </TT> text goes here<BR>
I like to organize a complicated thought as a list. One can use HTML to obtain
an indented list with bullets at the left of each entry by using:
<UL>
- <LI> indented text goes here. Each item in the list can have a length
of many lines, if necessary. There is no command at the end of each item in the list,
except at the end of the last line of the last item.
- <LI> next item goes here
- <LI> another item can go here
- <LI> last item in the list is here. </UL>
The text that follows /UL is set at the left edge.
One can use HTML to obtain an indented list with numbers at the left
of each entry by using the following commands. One can get a variety of
different numbering schemes by using the <OL type=#> command,
where # is the character for the first item in the list, chosen from the
following: a, A, I, i, 1. If you omit the "type=#", then HTML assumes
# = 1.
<OL>
- <LI> indented text goes here. Each item in the list can have a length
of many lines, if necessary.
There is no command at the end of each item in the numbered list.
- <LI> next item goes here
- <LI> another item can go here
- <LI> last item in the list is here. </OL>
The text that follows the /OL is set at the left edge.
Color
The color of text can be set with the <FONT COLOR="#rrggbb">
command, where rr is a two hexadecimal digits (base 16) code for red,
gg a two digit code for green, and bb a two digit code for blue.
Values range from zero (00) to full (FF).
Computers in 1990s often used graphics cards that supported only 256 color values,
and for those now antique cards, values of
either FF, CC, 99, 66, 33, or 00 were preferred for each color.
For more information, see the palettes at:
I have made a conservative choice for colors of background and links in my documents:
<BODY BGCOLOR="#DDDDDD" TEXT="#000000" LINK="#0000FF" VLINK="#990000" ALINK="#FF0000">
which uses a brighter gray background than the Netscape 3.0 default.
The user may also change the color in the browser
(e.g., in Netscape pull down the Edit menu, then select Preferences, then Appearance, followed by Colors).
If the user selects green text and you specify a BGCOLOR that is green, then your text will be
invisible to that user! If the user selects a yellow background and you
specify yellow text, then your text will be invisible to that user!
Users who specify colors in their Netscape browsers should consider also selecting
"always use my colors, overriding document", so there is a consistent set of contrasting choices.
Authors who specify colors with the <BODY> command should specify
all five items, so there is a consistent set of contrasting choices.
Authors of HTML pages might consider avoiding the <FONT COLOR="#rrggbb"> command,
because the author's choice of color for text could be invisible against the
background color specified in the user's browser. Instead, emphasis
can be obtained with the
- <B> command for bold,
- <I> command for italics, and
- <BIG> command for one size larger text.
On the other hand, use of color gives spice to a document that makes
it more pleasing to read. Using dark colors (e.g., black = 000000,
blue = 0000CC, or
red = 990000) is
unlikely to conflict with background colors chosen by a user,
which tend to be bright. Example bright background colors include:
grey = #DDDDDD (my usual choice) |
white = #FFFFFF |
pink = #FFDDDD |
green = #CCFFCC |
yellow = #FFFFCC |
cyan = #CCFFFF |
magenta = #FFDDFF |
purple = #EEDDFF |
Do not use my names for colors in HTML code, instead use the
three hexadecimal digits: e.g.,
<BODY BGCOLOR="#FFDDDD" >
for pink.
I like Warren Steel's conservative comments about the
FONT command.
Section Headings
There are two ways to make a heading. The hard way:
<CENTER><FONT SIZE ="+2">heading</FONT></CENTER>
and the easy way:
<H2 ALIGN=CENTER>another heading
</H2>
Not only is the second way easier, but some search engines allegedly
look for words in <H#> to collect for indexing. However, each
<H#> command acts as if it had two gratuitous <BR> commands
at the end.
<H4> corresponds to normal size text </H4>
<H3> corresponds to FONT SIZE="+1" text </H3>
<H2> corresponds to FONT SIZE="+2" text </H2>
Links
Hypertext links are what turns the Internet from a large bulletin board
into a useful resource that includes search engines.
Therefore, it is critical that people who post pages on the Internet
carefully choose the URL of their documents (including file names for each
document), so that links to these URLs will be stable.
Two professors at the University of Nebraska at Lincoln developed
three online biochemistry classes, with 515 links.
When they found that they were spending about four hours/month
checking and revising links on their pages, they did some research and
found that the half-life of their links was only 58 months in mid-2002.
Here is the current version of their
Report
on link rot.
Sometimes, it will be necessary to change a URL.
When that happens it is good etiquette, as well as necessary for stability
of links, to replace the old document with a short document that
refers the reader to the new URL.
For example, in August 1998, I moved my essays on law and technology
from a website at CompuServe to my own domain name (www.rbs2.com/).
Here is an example of how I redirected a reader who requested my
essay on the response of law to new technology,
file name = lt.htm
At the old URL, ourworld.compuserve.com/homepages/rstandler/lt.htm
I posted a brief document that contained the following code:
<HTML>
<HEAD>
<META NAME="ROBOTS" CONTENT="noindex, follow">
<META NAME="AUTHOR" CONTENT="Dr. Ronald B. Standler">
<META NAME="DESCRIPTION" CONTENT="this page has moved to www.rbs2.com/lt.htm">
<META HTTP-EQUIV="REFRESH" CONTENT="1; URL=http://www.rbs2.com/lt.htm">
<TITLE>new URL for this document</TITLE>
</HEAD>
<BODY>
<BIG>
The document that you have requested is now located at<TT>
<A HREF="http://www.rbs2.com/lt.htm">www.rbs2.com/lt.htm</A></TT></BIG><BR>
<BR>
Please wait while you are shuffled around the Internet to the current
location of the document.<BR>
<BR>
</BODY>
</HTML>
Notes:
- The META ROBOTS noindex command tries to prevent search engines from indexing
the page at the old URL. The follow command tells search engines to follow
the link(s) on the webpage, i.e., to link to the new location of the essay.
- The META DESCRIPTION and TITLE lines are included in case
a search engine indexes this page.
- The META HTTP-EQUIV line automatically transfers the reader to the new URL.
- The text in the BODY of the page gives the user something to read
while his/her web browser is being redirected to the new URL.
As a redundant precaution, and also to give search engine robots something to index,
I also included a link to the new URL in the BODY of the page
The referring page at the old URL should remain on the Internet for at
least six months, preferably at least a year, so that all of the
search engines that have indexed the old URL will have done a routine
crawl of their database and discovered the URL has changed.
Also, the new URL needs to be added to search engines in the usual way.
In the meantime, the referring page at the old URL redirects users
who have found a link to the old URL.
I deleted all of my documents from CompuServe at the end of February 2001,
30 months after I established my professional website.
However, you can find a current copy of a referring page similar to the
above code at
www.rbs2.com/privacy.html.
Some search engines (e.g., AltaVista, Google, AllTheWeb) can be used to
find pages that contain a link to a specific URL. In this way, one can
find links that need to be changed. One hopes that the author of each
link has put his/her e-mail address at the bottom of the page, or at the
bottom of his/her homepage, so they can be notified of the changed URL.
Link Checkers
Approaching the same problem from the other direction, there are
services that will automatically check all of the links on your page
and report dead links to you:
- World Wide Web Consortium
- Häkan Svensson in Sweden
- anybrowser.com
the robots.txt file
In the old days of the internet, one submitted each file name
to a search engine for listing. Modern practice is to submit only
the domain name (e.g., http://www.sitename.com/ )
to the search engine,
which will then send out a "robot" or "spider" to crawl through
all of the webpages at that site and bring back data on these pages
to list in the search engine. Once a website is listed in a search engine's
database, robots/spiders from that search engine will periodically
crawl the website, to update the database. However, robots/spiders
only descend through a series of links, starting at the website's homepage,
and will not find documents that are not linked at the website.
To attempt to prevent a search engine from indexing a particular HTML document,
one can include the following line in the header of the HTML file:
<META NAME="ROBOTS" CONTENT="noindex">
However, not all robots/spiders obey this HTML command.
Another way to prevent robots/spiders from indexing a webpage is to
specify the name of that webpage in a robots.txt file.
How often do robots/spiders visit a website?
My professional website (www.rbs2.com/), which has been
on the Internet since July 1998 and is included in all of the major
search engines, has a robots.txt file
that had an average of 30 hits/day during June 2001.
During February 2005, my robots.txt file had an average of
78 hits/day, when my whole website had 2030 hits/day.
Clearly, a robots.txt file is worthwhile if one wants to
exclude some webpages from search engines.
Before I describe the robots.txt file, it is worthwhile to
discuss why one might want to post a webpage that is hidden from
search engines. For example,
- Webpages that are content-free do not belong in a search engines'
databases. For example:
All of the file names at my websites end in .htm, however
some visitors type an .html extension or follow a link
with the wrong extension. To allow these visitors to see the page
that they seek (e.g., filename.htm),
I posted some pages (e.g., filename.html) that redirect visitors
who have used the wrong extension. It would be a waste of search engine
database space to include filename.html in a search engine,
since that file is content-free and only serves to direct a reader
to the correct file.
- Pages listed in search engines should be reasons why a person would want
to go to a website. Other pages do not belong in search engines'
databases. For example:
- Terms of service (i.e., license.htm) and disclaimers
(i.e., disclaim.htm). These pages are important only
to visitors at a website, not a reason to visit a website.
- What's new at my professional website, www.rbs2.com/new.htm,
a webpage that I posted for repeat visitors to my professional website,
to tell them which essays I have revised since their last visit.
There is no reason to list new.htm in search engines.
- My webpage that lists my professional fees. I post that page
for potential clients who have already decided that I may be an
appropriate attorney or consultant for them. I don't want to
assist my competitors who are searching the internet for terms like
"legal fees" or "consulting fees".
- Files that the webmaster wants to hide from the public, for example:
- The analysis of logfiles that shows the number of hits on each
webpage and the referring sites for visitors (i.e., the URL of the page that
contained the link that the visitor clicked to arrive at my webpage).
- A draft webpage that is being tested with an HTML validator and
shared with a few colleagues for their criticism, prior to
public posting.
example of robots.txt file
Put the following file in the root directory (i.e., the directory that
contains index.htm or homepage.htm) of a website:
User-agent: *
Disallow: /disclaim.htm
Disallow: /license.htm
This example User-agent command specifies that the following lines apply to
all robots/spiders. Each Disallow command excludes
from the robot's/spider's collection the one named file
that is located in the root directory.
resources for robots.txt files
Users who are interested in more complicated ways of using a robots.txt
file (e.g., websites that have subdirectories)
should use a search engine to search for "robots.txt"
and browse through various tutorials.
After uploading a robots.txt file to the root directory
of a website, that file should be tested. I have found the following
validators helpful:
Finally, test the robots.txt file by sending the robot from the
link checking program
to the root directory (e.g., http://www.sitename.com/)
without specifying a file name.
The results of the link checker should show the disallowed files
that are linked on the homepage
as "access denied for robots", or similar nomenclature.
Dashes and Greek Letters
For many years, the en-dash and em-dash (so-called because the
dashes are with width of the characters n and m, respectively)
were represented in HTML by the codes:
– —
Towards the end of the 2001 year, the HTML validator at the W3C site
began declaring those codes invalid, because the proper SGML codes are:
– —
– —
The official table of SGML characters is posted at the
W3C website.
Greek Letters
The old way to display Greek Letters in HTML is to use the
Symbol font command:
<FONT FACE="Symbol"> l p </FONT>
l p
A table of the Greek letters is given by
Prof. Lovelock
at the University of Arizona.
The above-mentioned way of displaying Greek letters does not work in
Firefox or Google Chrome on an Apple computer in the year 2010.
Another way to include Greek letters in HTML documents is to use
HTML 4
characters:
alpha beta gamma delta epsilon Delta
α β γ δ ε Δ
α β γ δ ε Δ
A list of HTML4 character codes is given in a table by
Prof. Barzilai
at Salisbury University in Maryland.
One can also use Unicode:
alpha beta gamma delta epsilon Delta
α β γ δ ε Δ
α β γ δ ε Δ
Unicode charts of
Greek letters,
math symbols.
Menu of all Unicode charts.
Helpful
table
of Unicode in HTML documents. (choose option for hexadecimal
numerical HTML encoding of the Unicode character).
Greek letters begin on page 4 (start=768).
Tables by Tomas Schild of Tübingen Germany.
<LINK REL = ___> command
The HTML specification includes a rarely-used command that permits
a webbrowser to link to related documents (e.g., previous or next page
in a series of webpages, homepage, copyright notice for website, etc.) or to
send e-mail to the author of the website. One reason that this command
is little used is that most webbrowsers before the year 2002 did not support
this command.
Examples of how to use this command:
- <HEAD>
- <LINK REV="made" HREF="mailto:webmaster@xyz.com">
- <LINK REL="home" HREF="index.htm" TITLE="Homepage">
- <LINK REL="copyright" HREF="tos.htm" TITLE="Terms of Service">
- </HEAD>
There are some "official" values of the LINK REL="___" phrase that
correspond to symbols shown in some webbrowsers. For a list of the official
values used by the i-Cab browser,
one of the first webbrowsers to implement this command,
see the FAQ under
"I want to add some LINK tags ...."
For more information on this command, type the following into a search engine:
"LINK REL" made index home copyright
and follow the results.
http://www.rbs0.com/hints.htm
created 14 July 1997, text revised 28 July 2013, links updated 1 June 2014
return to my personal homepage