Book HomeHTML & XHTML: The Definitive GuideSearch this book

Appendix A. HTML Grammar

Contents:

Grammatical Conventions
The Grammar

For the most part, the exact syntax of an HTML or XHTML document is not rigidly enforced by a browser. This gives authors wide latitude in creating documents and gives rise to documents that work on most browsers, but are actually incompatible with the HTML and XHTML standards. Stick to the standards unless your documents are fly-by-night affairs.

The standards explicitly define the ordering and nesting of tags and document elements. This syntax is embedded within the appropriate Document Type Definition and is not readily understood by those not versed in SGML (for HTML 4.01, see Appendix D, "The HTML 4.01 DTD") or XML (for XHTML 1.0, see Appendix E, "The XHTML 1.0 DTD"). Accordingly, we provide an alternate definition of the allowable HTML and XHTML syntax, using a fairly common tool called a "grammar."

Grammar, whether it defines English sentences or HTML documents, is just a set of rules that indicates the order of language elements. These language elements can be divided into two sets: terminal (the actual words of the language) and nonterminal (all other grammatical rules). In HTML and XHTML, the words correspond to the embedded markup tags and text in a document.

To use the grammar to create a valid document, follow the order of the rules to see where the tags and text may be placed to create a valid document.

A.1. Grammatical Conventions

We use a number of typographic and punctuation conventions to make our grammar easy to understand.

A.1.1. Typographic and Naming Conventions

For our grammar, we denote the terminals with a bold, monospaced Courier typeface. The nonterminals appear in italicized text.

We also use a simple naming convention for the majority of our nonterminals: if one defines the syntax of a specific tag, its name will be the tag name followed by _tag. If a nonterminal defines the various language elements that may be nested within a certain tag, its name will be the tag name followed by _content.

For example, if you are wondering exactly which elements are allowed within an <a> tag, you can look for the a_content rule within the grammar. Similarly, to determine the correct syntax of a definition list created with the <dl> tag, look for the dl_tag rule.

A.1.2. Punctuation Conventions

Each rule in the grammar starts with the rule's name, followed by the replacement symbol (::=) and the rule's value. We've intentionally kept the grammar simple, but we do use three punctuation elements to denote alternation, repetition, and optional elements in the grammar.

A.1.3. More Details

Our grammar stops at the tag level; it does not delve further to show the syntax of each tag, including tag attributes. For these details, refer to the Quick Reference card included with this book.

A.1.4. Predefined Nonterminals

The HTML and XHTML standards define a few specific kinds of content that correspond to various types of text. We use these content types throughout the grammar. They are:

literal_text

Text is interpreted exactly as specified; no character entities or style tags are recognized.

plain_text

Regular characters in the document character encoding, along with character entities denoted by the ampersand character.

style_text

Like plain_text, with physical- and content-based style tags allowed.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.