CS73N

HTML

HTML INFO

Abstract by Gio Wiederhold, 19 Jan 2000. Minor updates Mar 2001, Jan 2002, March 2002, Feb July 2004, March 2005. Moved to CS73N Wiki 3 May 2007, but it needs a thorough review.

Objective

 The HyperText Markup Language (HTML) was created by the high-energy physics community in the late 1980ties to allow them to rapidly share over the Internet Textual documents with included images and live cross-references to other documents, creating Hyper documents. HTML achieves that objective by having embedded Markups within the text. These markups can contain Hyperlinks to other objects by using internet addresses, according to the HTTP (HyperText Transfer Protocol). It also supports markups specifying internal document formatting of its predecessor, SGML, the Standard Generalized Markup Language. That languge was primarily used for typesetting of manuals, but is also used now for some digital libraries.

HTML briefly

We describe only a few basic commands and markups of HTML. The common version in 2002 was HTML 4.0, HTML files using earlier versions should still work. In a browser you can inspect or save the source file (in MS IE click on [View], then on [Source]) and see the formatting that was used. The file you are now reading was created by writing HTML directly and you can learn much by viewing it.
<META .. commands give instructions to the browser. Not all browsers handle all commands, but they just ignore ones they don't recognize. Browsers don't have treat commands in precisely the same way.

    In order to serve its clientele using MS Word, Excel, Powerpoint, etc, Microsoft has introduced many commands that can be understood only by its Internet Explorer. Those commands allow a more faithful representation of its files, when saved as html. See the section OTHER below. Some of those commands appear bracketed as <xml ... or are prefixed to look like comments <! ... . For our purpose they are mainly confusing. Try to ignore them.

Very useful information about the many options for HTML presentations is being maintained by John Pollock: The HTML Pit Stop

Conventions

HTML is an application conforming to ISO 8879 (Standard Graphic Markup Language or SGML). SGML uses embedded directives or markups to indicate formatting, while leaving the interpretation to the client's display program and its knowledge about the screen, paper, user preferences, etc. These markups are bracketed by Less-Than(<) and Greater-Than (>) symbols. In this document we use UPPER CASE for all HTML directives shown, although lower- and upper-case markups are equivalent. I use upper case to find markups faster in text.
The Combination <A ..> ... </A> was left undefined in SGML, so that many hypertext specific markups are bracketed by <A.. > and </A>.  There are also special characters, which start with an ampersand (&).
Browsers may ignore stuff in these <brackets> they don't recognize. To enable us to show the markups in this HTML document we use internally some special symbols (see below).

General layout

Each document should start with a declaration
<!Doctype html public "-//W3O//DTD/ W3 HTML 2.0//EN">,
here indicating that the document conforms to HTML version 2.0 (but ignored by most browsers) , followed by
<HTML>.
Most commands have a corresponding closure, for instance there should be a
</HTML> at end of the document.

 

A document is split into a HEAD and a BODY.

 The HEAD is for external information, as the TITLE, used by the browser for its frame, and the external name of the page to the browser, i.e.,
<HEAD><TITLE>HTML information for CS99I book</TITLE>
That can be followed by a reference to the web page's own location, useful when trying to find out where one has gotten on the web:
<BASE xhref="http://www-db.stanford.edu/pub/gio/CS99I/html-info.html">
</HEAD>

and a BODY, i.e.,
<BODY> followed by everything in the document, until the closing </BODY>,
except for <! declarations not to be displayed >

Headers and paragraph breaks

There are six levels of section headers:
<Hx>heading text</Hx> where x = 1..6
We use <H1> for the chapter headings, <H2> for the major sections, and <H3> for subsections.

<P> starts a paragraph, to be terminated with </P>,
and
<BR> forces a linebreak (used liberally in this document).

Lists are a of three types:
<yL> list: <UL> unumbered; <OL> numbered; <DL> definition
Each list entry starts with <LI> and ends with </LI>
and the list is terminated by </yL>.
List commands as <OL>, </OL> are also (mis)used to provide indenting of text.

Normally you want to leave as much formatting as possible to the browser, since it will adjust itself to the available page size and customer preferences, but formatting can be disabled by bracketing
<PRE> preformatted asis </PRE>.

Cross References

The ability to refer to other documents is the main innovation of HTML.
<A xhref="filename"> mousearea </A> as
<A xhref="http://db.stanford.edu/pub/gio/CS99I/intro.html">CS99I Introductory Chapter</A>
If you want the new page to be in a new window on the screen, place the order: target="_blank" before the closing > of the HREF instruction, i.e.,
<A xhref="http://db.stanford.edu/pub/gio/CS99I/intro.html" target="_blank">.
Programs to help you create web pages generally have some way to insert the target="_blank"  order.
Using HREF links also works to go to files that are in other formats, if your browser has the appropriate plugin, say Ghostscript for
<A xhref="http://db.stanford.edu/pub/gio/slides/atarpa.ps">ARPA postscript slides</A>.

 

A webpage author can use icons or images instead of text as the point of departure.
For instance, if you want to go back to How to Write for the Web, click   .

One can also use a hyperlink to go into the middle of a document,

if a anchor name has been given to the desired entrypoint


<A xhref="http://cs73n.editme.com/_Edit#SecSix">Section 6</A> --> <A NAME="SecSix">
(Note: The NAME=definition appears not to work inside of TABLEs)

Images

There are many image formats, and 2 ways two show them. Images can be embedded, so that always show, or referenced, requiring a click:

    Embedded: <IMG Align=top/middle xsrc="imagefilename.format"> say

    <IMG xsrc="../gifs/exclaim.gif"> to enter into text.
   Referenced as distinct documents, requiring a click:

            <A xhref="http://infolab.stanford.edu/pub/gio/gifs/exclaim.gif"> to show exclaim.gif</A>.

Standard formats are

  1. .gif, (graphic image format) a simple graphic image format used with HTML
  2. .jpg or .jpeg a compressed format for images defined by the Joint Photographic Experts Group.
  3. .tiff (Tagged image format), defined by Aldus (now part df Adobe) and Microsoft.
  4. .bmp for Bitmaps, a PC format
  5. .xbm for XBitmaps, a UNIX format
  6. .mpg or .mpeg a compressed format for video.
  7. .mp3 a compressed format for audio and music.

Most browsers can handle these formats, or have plugins (optional software) for these image formats, but there a many more specioalized standards, as used for archirectural design, for X-rays, etc. .


To put a 3 pixel border around a worm we can add <IMG xsrc="../gifs/nematode.jpg" border="3">: .


By specifying, like for the Bee below, explicit image sizes you will cause the browser to expand or shrink the image. Below we doubled the width of the left hand 87 x 32 pixels bee so that
becomes
by entering : <img xsrc="../gifs/Bee.gif" width="174" height="32" >
One can also map clickable areas within an image. The left hand Bee was specified with
    <img xsrc="../gifs/Bee.gif" width="87" height="32" usemap="#Map">
and the map is defined so that clicking the head will take you UP to the top, and clicking the tail (try it) will take you DOWN, but one could equally well go to remote webpages.
    <map name="Map">
    <area shape="rect" coords="0,0,32,32" xhref="http://cs73n.editme.com/_Edit#UP">
    <area shape="rect" coords="55,0,87,32" xhref="http://cs73n.editme.com/_Edit#DOWN">
    </map>
If you want to go back to How to Write for the Web, click .
In UNIX use xv to edit images.

email addresses

Use
<A xhref="mailto:gio@cs.stanford.edu">email to: gio@cs.stanford.edu</A>
to insert a mailing address. The text between the the opening <A..> and the closing </A> is arbitrary.

Other useful formatting commands

<BLOCKQUOTE> for quotations</BLOCKQUOTE>
<ADDRESS> for addresses <\ADDRESS>
<CENTER> text </CENTER>

Special characters

Most of these symbols starting with & (see also ISO 8859-1 encodings). Not all browsers interpret all of these characters, as you might notice below in the ():
  1. &lt for <
  2. &gt for >
  3. &amp for &
  4. &quot for "
  5. &nbsp for a non-word-breaking space ( ), compare size to nothing () between the parentheses
  6. &ndash for a short (n-sized) dash (–)
  7. &mdash for a long (mn-sized) dash (—)
  8. &shy for a low dash (­)
  9. &auml for a-umlaut (ä), &ouml for o-umlaut (ö)
  10. &#169 or &copy; copyright symbol (©)
  11. &#153 trademark symbol (™)
and many others. A semicolon can be used after a symbol to terminate it, the semicolon will not show.
<NULL> creates an invisible break, useful when combining special and ordinary characters, while &nbsp; creates a space ( ) that does not break, like a blank character.
A <HR> creates a horizontal rule, like (
).
<underline> ... brackets (text to be underlined)<underline> (maybe).

More characters are denoted numerically as &nnn;, where nnn is the sum of the row and column numbers in the table below:

All 256 1 byte characters

Note that any characters your browser does not understand come out funny or as entered. I hope none crash your browser.

+0123456789|10111213141516171819|
0  |   |
20| !"#$%&'|
40()*+,-./01|23456789:;|
60<=>?@ABCDE|FGHIJKLMNO|
80PQRSTUVWXY|Z[\]^_`abc|
100defghijklm|nopqrstuvw|
120xyz{|}~|ƒˆŠ|
140ŒŽ|˜šœžŸ|
160 ¡¢£¤¥¦§¨©|ª«¬­
®¯°±²³|
180´µ·¸¹º»¼½|¾¿ÀÁÂÃÄÅÆÇ|
200ÈÉÊËÌÍÎÏÐÑ|ÒÓÔÕÖרÙÚÛ|
220ÜÝÞßàáâãäå|æçèéêëìíîï|
240ðñòóôõö÷øù|úûüýþÿ|

Font Styles

Styles, relative sizes, and colors can be indicated, but your browser chooses the actual representation.
<FONT with options to increase the size, say as <FONT SIZE=+1> by 1, SIZE=+1 until </FONT>
The type of font can be changed using the attribute face, as face="ARIAL"
and/or set the COLOR=BLUE> until </FONT>
The directives <SMALL> and <LARGEL> can be placed to bracket text. The change in size is not great.
Logical styles
<EM> Emphasis italics <EM> ; we use these for words cited in the glossary.
<STRONG> Strong emphasis italics <STRONG>
<CITE> book, journal citation italics <CITE>
<KBD> typing font <KBD>; we use these for examples of type-ins.
<VAR> substitution example font </VAR>
Physical styles
<B> bold <B>
<I> italic <I>
<TT> typewriter <TT>

Tables

We just show a summary example.
<TABLE> <TABLE BORDER=3> <TABLE CELLSPACING=2 (standard)>
<CAPTION> one line only, centered, plain, last line wins</CAPTION>
<TR><TH>a row of centered (default) header items <TH> more <TH> for as many columns as wanted, terminated by </TR>
<TH WIDTH=pixels or WIDTH=percent%>, CENTER is the default.
<TR><TD>a row of data fields </TD> <TD> more data </TD> <TD> field with left-aligned data (default) </TD> </TR>
<TR> more rows, joint field width automatic, multi line automatic<TD> <TD> </TR>
<TR>more rows
<TD or TH options include
ALIGN=LEFT or CENTER or RIGHT
NOWRAP to keep cells limited
COLSPAN=1 (standard)> to make boxes that span more than one column
ROWSPAN=1 (standard)> to make boxes that span more than one row
VALIGN=TOP or MIDDLE or BOTTOM or BASELINE>
>

</TABLE>

By default the alignment of tables is done automatically by the browser, an example without SPACING, WIDTH, ALIGN, or SPAN options is seen above in the table of characters.

Paragraph styles

The Paragraph bracket allow setting of a variety of style, but their interpretation can vary by browser. Useful are:

<P STYLE='MARGIN-LEFT:0.5in'> to provide a half inch indent, used here
<P STYLE='TAB-STOPS:2.5in'> ?? how to use ?? ;
<TAB ID=tabname> and <TAB TO=tabname%gt; seems only proposed;
multiple style entries can be placed within the quotes and separated with a semicolon (;).

text-of-length-for-tabother text
tabbed material on next line.

Long Tables

Tables can be very long, look for instance at the list of a list of all 84 Hitchcock movies has been manually split into 4 distinct tables. Long tables take long to load, and are hard to manage with scrollbars. We can use an option, DATAPAGESIZE, in the TABLE specification to to split the presentation of a long table, as

remainder of section is not completed

<TABLE DATAPAGESIZE=8 ID=table>. To allow manipulation of that table we add a provision for <INPUT TYPE="button" VALUE="Next" ONCLICK="table.nextPage();">
of a button click, which refers to that table's ID. Now we can look at all of Hitchcock's films page by page, although they are stored as a single HTML table.

Comments

There are two levels of comments;

  1. Comments intended for the systems that process HTML text. Those start with <&!command, where command is to be understood by the processor, as `Doctype' in the header of an HTML file. It is terminated by an > character.
  2. Internal comments, intended for the persons maintaining an HTML file. Those start with <!-- and are terminated by -->.
Comments should not extend over more than one line.

Counters

The counter we are using for the Web-book is installed on the server bergman.stanford.edu. An example would be:

Hit's since 15 Jan 2000:

More information about such a counter can be found at the Hitometer counter's home page.

HTML Checkers

To check HTML files for correctness, you may want to use a HTML checker. One was made by an independent company (Web Site Garage), bought by Netscape late 1998. It is now (Jan 2002) at Web Site Garage of Netscape.

OTHER

<meta> introduces meta commands, meant for search engines, and hidden from the user. They are often misused to cause high rankings, as
<meta Money Money Money Money > to give the impression that this web page is financially worthwhile.


Current version of Microsoft Word and Powerpoint have the option to convert their documents to HTML, and vice versa. But since the capabilities of the HTML browsers don't match the capabilities of MS Word, the result often is imperfect. Some subsequent manual editing can make stuff look much better.

<Style> introduces the use of style templates to improve the look, but make HTML musch less general. Specific Microsoft styles within the style section are bracketed by

    <!--
          /* Style Definitions */
    p.MsoNormal, li.MsoNormal, div.MsoNormal
    and
    --gt;
Omitting them will change the look, but should not change the contents. back to the Bee 

Items that require XML processing are introduced with

    <!--[if gte mso 9]><xml>
    to <xml>l>![endif]-->
so they can be skipped by browsers older than delivered with MicroSoftOffice version 9 and other browsers.

See also the CS99I references.

Site

Changes
Index
Search

User

Log In
Register

 
 

Last Modified 2007-05-03