Book HomeWeb Design in a NutshellSearch this book

Chapter 30. Introduction to XML

Contents:

Background
How It Works
XML Document Syntax
Document Type Definition (DTD)
Examples of XML Technology
Where to Learn More

XML (Extensible Markup Language) is a document encoding or markup standard that has been approved by the World Wide Web Consortium. XML is not so much a language in itself (like HTML), but rather a set of rules for creating other markup languages. It is a metalanguage used to define other languages. If this all sounds highfalutin to you, think of it this way: XML provides a way for you to make up your own tags! This is a powerful new tool for exchanging meaningful information.

Consider these two examples, the first using standard HTML markup, the second using a markup language written according to the rules of XML:

<p>Bobby Five</p>
<p>4456</p>
<p>111.32</p>

<name>Bobby Five</name>
<accountNumber>4456</accountNumber>
<balance>111.32</balance>

The XML file tells a lot more about the information contained in the tags. With meaningful markup tags, elements on the page aren't just headings and paragraphs: they become useful data. So while this information can be displayed on a page, it can just as easily be stored in a database (which is a common use of XML-formatted information). Using XML, various communities -- business groups, scientists, trade associations -- may now define a markup language to suit their particular needs for information exchange and processing over the Web.

XML can also be used to indicate the structure of specialized information that could not be represented using HTML alone, such as musical notation and mathematical formulas. Chapter 27, "Introduction to SMIL" illustrates how the XML-based language SMIL is used to assemble multimedia presentations. Chapter 31, "XHTML" discusses how the rules of XML have been applied to the HTML authoring language. We'll look at other examples of XML applications later in this chapter.

30.1. Background

The example at the beginning of this chapter highlights the limitations of HTML. HTML was designed specifically for displaying content in a browser, but isn't good for much else. When the creators of the Web needed a markup language that told browsers how to display web content, they used SGML guidelines to create HTML. SGML, Standard Generalized Markup Language, is a comprehensive set of syntax rules for marking up documents and data which has existed since the 1980s. It is the big kahuna of metalanguages! For information on SGML, including its history, see http://www.oasis-open.org/cover/general.html.

As the Web matured, it became clear that there was the need for more versatile markup languages. SGML provided a good model, but it was too vast and complex; it had many features that were unnecessary and wouldn't be used in the Web environment. XML is a simplified and reduced form of SGML, tailored just for the needs of sharing information over the Internet. It is powerful enough to describe data, but light enough to travel across the Web. Much of the credit for XML's creation can be attributed to Jon Bosak of Sun Microsystems, Inc., who started the W3C working group responsible for scaling down SGML to its portable, Web-friendly form.

As of this writing, XML is in Version 1.0, which was first issued in February 1998 and revised in October 2000. Various aspects and modules of XML are still in development. For more information and updates on the progress of the standard, see the W3C's site at http://www.w3.org/XML.

One of the first things the W3C did once they had XML in place was to apply it to the existing HTML specification. The resulting language is XHTML, which is just HTML rewritten according to the stricter, yet more expandable, rules of XML. For more information on XHTML, see Chapter 31, "XHTML".



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.