XHTML Grammar Overview

Original Author(s): Markus Gylling

There are about 100 elements in the XHTML Document Type family. Here, only a subset will be introduced.

Refer to the Read More section for more details.

If you have not yet reviewed the XML Syntax Introduction section, you are advised to do so before proceeding.

Root element
<html>
Children of root
The XHTML DTD defines that the root element <html> can have only two children: <head> and <body>.

  <html>
   <head>...</head>
   <body>...</body>
  </html>

The <head> element contains data (children) such as meta information that is not necessarily presented to the user of the completed document.

The <body> element contains all of the document data/text. The text of the document will occur in different elements, that all are children of <body>.

According to the XHTML DTD, <head> must come before <body>, thus the following example is invalid:


  <html>
   <body>...</body>
   <head>...</head>
  </html>

but the following is valid:


  <html>
   <head>...</head>
   <body>...</body>
  </html>

title
The <title> element contains the document title.
The text node of <title> will not be displayed as a part of the document itself, but may be displayed or used in other ways, such as in the caption bar of the browser window.
meta
<meta /> elements contain meta level information about the document.
Since the meta element is empty, the information is contained within attributes instead of text nodes.

  ...
  <head>
    <title>A Farewell To Arms</title>
    <meta
      name='dc:title'
      content='A Farewell To Arms' />
    <meta
      name='dc:creator'
      content='Ernest Hemingway' />
  </head>
  ...

For more information on metadata, refer to the DTB Metadata Section.

XHTML allows six heading levels. Each level increment describes on additional step down on the hierarchical structure axis. The element names to be used are: <h1>,<h2>,<h3>,<h4>,<h5>,<h6>.

In DAISY DTBs it is required that heading levels must not be omitted, that is, in the sequence of levels, a level must not be "skipped".

The following example is forbidden:


  <h1>Chapter 1</h1>
  <p>Paragraph text</p>
  <h3>Chapter 1.1.1</h3>
  <p>Paragraph text</p>

The following example is correct:


  <h1>Chapter 1</h1>
  <p>Paragraph text</p>
  <h2>Chapter 1.1</h2>
  <p>Paragraph text</p>

A very common element in XHTML is the paragraph element.

In XHTML the element name for paragraph is <p>.


  <p>Paragraph text</p>

XHTML includes two types of lists. The first, unordered, is often called a "bullet list". Syntax in XHTML is:


  <ul>
    <li>list item 1</li>
    <li>list item 2</li>
    <li>list item 3</li>
  </ul>

As shown above, in the unordered list <ul> is the parent of any number of <li>children.

The second list type in XHTML - the ordered list - uses the same syntax, but element name is "ol". This will cause numbered list items.


  <ol>
    <li>list item 1</li>
    <li>list item 2</li>
    <li>list item 3</li>
  </ol>

Definition lists are used to define terms and words. Three elements are used in combination:
<dl> definition list
<dt> definition term
<dd> definition data


  <dl>
   <dt>XML</dt>
   <dd>Abbreviation for
     eXtensible Markup Language</dd>
   <dt>XHTML</dt>
   <dd>An XML DTD, XHTML is an
     abbreviation for
     eXtensible HyperText Markup Language</dd>
  </dl>

As shown above, the definition list <dl> is the parent of any number of paired <dt> and <dd> children.

The elements span and div are used when no other element in the XHTML DTD is suitable to describe what kind of text the element contains.

For example, in the XHTML DTD there is no element for a "page". Instead, we use the span element, and add a class attribute to describe what the element represents.

There are several class attributes used in Daisy 2.02 to specify element content.


  <span class="page-normal">23</span>
  <span class="page-front">IV</span>
  <span class="page-special">A-10</span>
  <span class="noteref">1</span>
  <div class="notebody">notebody text</span>
  <span class="sidebar">sidebar text</span>

Note that besides the class attribute values above, which you must use when including these types of text/data in Daisy 2.02 DTBs, you are free to create class attribute values of your own. XHTML specifies element names, but it does not specify class attribute values.

Read more about special element usage in DAISY DTBs in the DAISY XHTML Element Usage Requirements Section.

Note that you are free to create class attribute values of your own. XHTML specifies element names, but it does not specify class attribute values. It is your responsibility to create semantically meaningful values for the class attributes you use.


  Example of custom class attribute values:

  <span class="sent">
    <span class="wrd">This</span>
    <span class="wrd">is</span>
    <span class="wrd">a</span>
    <span class="wrd">sentence</span>
    <span class="dot">.</span>
  <span>


Images are included in the document using the img element. Two important attributes are added; src which is the link (URL) to the image, and alt which is a short text describing the image.


  <img
    src="flower01.jpg"
    alt="An image of a flower" />


  <img
    src="http://www.botanica.org/gfx/flower01.jpg"
    alt="An image of a flower" />

When a short text equivalent does not suffice, provide additional information in a file referenced in the longdesc attribute:


  <img
    src="http://www.botanica.org/gfx/flower01.jpg"
    alt="An image of a flower"
    longdesc="/flower01.html"
  />

The XHTML DTD differentiates between inline and block elements.

Block elements are elements that may contain text nodes or other elements as children.

Inline elements normally only contain text nodes, NOT nested children.

Inline elements may be nested within block elements, but block elements may not be nested within inline elements.


 <p>This paragraph has an
   <em>emphasis</em>
   element nested inside it
 </p>

The above example is allowed, because <p> is block and <em> is inline.

But the below example is not allowed (invalid) because <span> is inline and <h1> is block.


  <span>This span has a
    <h1>heading</h1>
    element nested inside it, which
    is not allowed.
  </span>

Some commonly used inline XHTML elements are:

<em>: emphasis

 <p>It is <em>very</em>
   important to understand this.
 </p>

<strong>: strong emphasis

 <p>
   We <strong>strongly recommend</strong>
   that you try this at home.
 </p>

<kbd>: keyboard (indicates this is a computer keyboard shortcut).

 <p>
   To close the program, you may use the
     <kbd>
       Alt+F4
     </kbd>
   keyboard shortcut.
 </p>

<q>: inline quote

 <p>
   And then he said
   <q>
     I think I understand this.
   </q>
 </p>

<span>: with class attribute

 <p>And then he asked
   <span class='question'>
     Do you think this is easy?
   </span>
 </p>

In most browsers, the visual display differs between inline and block elements; a block element causes a line break, but an inline element does not.

Example: Use of the span element will cause all text nodes to appear on the same line, because span is an inline element.


 <div>
  <span>span element</span>
  <span>span element</span>
 </div>

Example: Use of the div element will cause text nodes to appear on one line each, because div is a block element.


 <div>
  <div>div element</div>
  <div>div element</div>
 </div>

Below is the result of the above two examples:

span element span element
div element
div element

The table element is used with the following children in combination:
<tr> table row
<td> table data or table cell
<th> table header or column heading


 <table summary="Table summary text">
  <caption>Table caption</caption>
  <tr>
    <th>Column 1</th>
    <th>Column 2</th>
  </tr>
  <tr>
    <td>Cell 1</td>
    <td>Cell 2</td>
  </tr>
  <tr>
    <td>Cell 3</td>
    <td>Cell 4</td>
  </tr>
 </table>

Hyperlinks are created using the anchor element (<a>) in combination with the href attribute.

The hyperlink points to another resource, in the same document, or to another document somewhere else on the Web. The current document is the source of the link; the value of the href attribute, a URL, is the target.


  <a href="http://www.daisy.org">
    DAISY website
  </a>

The target resource can either be another document, or a fragment of a document.

When fragments are referenced, the targets should consist of an id attribute value.


    <a href="news.html#workshop">
      DAISY Workshop News
    </a>

    [pointing to:]

    <h3 id="workshop">DAISY Workshops 2003</h3>
    <p>A workshop is being held in August...</p>

The id attribute (target) value can be duplicated in an anchor element with a name attribute as well for user agent compatibility purposes.


    <h3>
      <a id="workshop" name="workshop">
        APCD Workshops 2003
      </a>
    </h3>
    <p>A workshop is being held in July...</p>

The name attribute was deprecated as fragment identifier in XHTML 1.0 and removed entirely in XHTML 1.1

Every document on the Web has a unique address. The document's address is known as its uniform resource locator - URL.

Several XHTML elements include a URL attribute value, including hyperlinks, inline images, and forms. All use the same URL syntax.

DAISY DTBs also use the URL syntax (although sometimes referred to as a URI) to provide linkage information.

URL Examples

Absolute - pointing to web server on the World Wide Web
http://www.w3.org
Absolute - pointing to specific document at web server on the World Wide Web
http://www.w3.org/WAI/ATAG10-Conformance.html
Relative - pointing to other document at same server
../news/index.html
Relative - pointing to fragment (of an id or name attribute) in other document
document.html#section2
Relative - pointing to fragment (of an id or name attribute) in same document
#section2

There are two main cases when certain characters can not be typed as-is in text nodes or attribute values:

  1. Their presence would be misinterpreted as markup
  2. They are not available on the used system/keyboard

To handle this problem, XML uses a construct called character entity references. This is a "virtual" reference to a certain character.

A character entity reference always begin with the ampersand sign (&) and always end with the semicolon sign (;).

To cover for the first case above (misinterpretation as markup), XML predefines five character entity references. These are:

&lt;
The less-than sign - the opening angle bracket (<)
&amp;
The ampersand (&)
&gt;
The greater-than sign - the closing angle bracket (>)
&quot;
The straight, double quotation marks (")
&apos;
The apostrophe; a.k.a. the straight single quote (')

To cover for the second case above (character not available on the used keyboard), XHTML has defined three sets of named character entity references.

Refer to the XHTML 1 Alphanumeric Character Entities table for a full listing.

Lets say you have a piece of XHTML markup code that you want to include in a tutorial about markup. The correct semantical element to use for code snippets is code. The piece of markup that you want to include in the tutorial text looks like this:


 <p>
   A paragraph with an
     <em>emphasized</em>
   word.
 </p>

Now, two problems arise.

  1. If you just paste this example in as-is, the less than and greater than signs will be interpreted as markup by the XHTML parser/browser (Internet Explorer for example). This will result in the code snippet not showing as code in the display.

    This is solved by escaping the less than and greater than signs using character entities as described above.

  2. The whitespace (linebreaks, tabs, spaces etc) in the example will be truncated into a single space. This is the default behavior of parsers/browsers.

    This is solved by using the pre element. The pre element means "preformatted" and indicates that the text of pre shall have its original whitespace characters preserved.

The actual code of the example snippet becomes:

<pre>
  <code>
    &lt;p&gt;
      A paragraph with an
        &lt;em&gt;emphasized&lt;/em&gt;
      word.
    &lt;/p&gt;
  </code>
</pre>

The lang and xml:lang attributes are used to convey the natural language of the presentation. See further in the Internationalization tutorial.
Language codes are available in Language Code Listing.

  <html lang="en-GB" xml:lang="en-GB" >
   <head>...</head>
   <body>...</body>
  </html>

In an XHTML document, the DOCTYPE declaration for Transitional DTD content should be as follows:


 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 <html>
 ...
 </html>

The DOCTYPE declaration for Strict DTD content should be as follows:


 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 <html>
 ...
 </html>


 <?xml version="1.0"?>
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 <html>
 ...
 </html>

If you use a character set encoding other than utf-8, you must specify an encoding attribute on the XML declaration. (See further the internationalization tutorial)


 <?xml version="1.0" encoding="iso-8859-1" ?>
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
 <html>
 ...
 </html>

[toc hidden:1]
DAISYpedia Categories: 


This page was last edited by PVerma on Friday, August 6, 2010 23:23
Text is available under the terms of the DAISY Consortium Intellectual Property Policy, Licensing, and Working Group Process.