Introduction to the XML concept

Original Author(s): Markus Gylling

eXtensible Markup Language


Markup refers to the sequence of characters or other symbols that you insert at certain places in a text or word processing file

Main categories:

  • presentational
  • descriptive

Veni, vidi, vici

Markup can be used to say:
descriptive This is a quote, in Latin, allegedly by Caesar.
presentational Display it in italics with Arial font.


A simple, flexible descriptive text markup format.

The idea is that the markup doesn't tell you what to do (graphically or else) with a piece of text, it tells you what it is, describes it. Another term could be "labeling."

All XML does is provide a nice flexible internationalized way to label the elements of a data structure and ship them around with the labels attached.

Descriptive markup was born in the world of publishing technology, as it had many advantages for serious large-scale publishing.

Makes it possible to define data structure and helps us understand its meaning and context; provides ways to describe the semantics and structure of the information.

Provides means to use grammars taylored for the specific context/need.

Antecedent: SGML

SGML, the Standard Generalized Markup Language, deals with the structural (descriptive) markup of electronic documents.

It was made an international standard by ISO in October 1986.

SGML soon became very popular thanks in particular to acceptance in the editing world, by large multi-national companies, governmental organizations, and, more recently, by the emergence of HTML, HyperText Markup Language, the source language of structured documents on the World Wide Web.

Reasons for and principles of XML

  • "Fixing the web"
    • SGML too complex
    • A need for meaningful (descriptive) markup
    • Device independence
  • A unified format for the publishing industry

Why is XML such an important development?

XML allows the flexible development of user-defined document types.

It provides a...

  • robust
  • flexible
  • non-proprietary
  • persistent
  • verifiable
  • cross-platform
  • cross-language

...file format for the storage and transmission of text and data both on and off the Web; and it removes the more complex options of SGML, making it easier to program for.

XML removes two constraints which were holding back Web and Electronic Information developments:

  • dependence on a single, inflexible document type (HTML) which was being much abused for tasks it was never designed for;
  • the complexity of full SGML, whose syntax allows many powerful but hard-to-program options.

Who is responsible?

XML is a project of the World Wide Web Consortium (W3C), and the development of the specification is being supervised by their XML Working Group. A Special Interest Group of co-opted contributors and experts from various fields contributed comments and reviews by email.

XML is a public format: it is not a proprietary development of any company. The v1.0 specification was accepted by the W3C as Recommendation on Feb 10, 1998.

Why do we need it? Why not just use Word or Notes? Or HTML?

Some typical replies off the web:

  • Information on a network which connects many different types of devices has to be usable on all of them.
  • Public information cannot afford to be restricted to one make or model or manufacturer, or to cede control of its data format to private hands.
  • It is also helpful for such information to be in a form that can be reused in many different ways, as this can minimize wasted time and effort.
  • Proprietary data formats, no matter how well documented or publicized, are simply not an option: their control still resides in private hands and they can be changed or withdrawn arbitrarily without notice.

XML does nothing

XML is a meta-language, used to create new languages.

XML is but a set of rules defining a common outer syntax for the markup.

XML is extensible

The markup used in HTML documents and the structure of HTML documents are predefined. The author of HTML documents can only use tags that are defined in the HTML standard. XML allows the author to define his own elements and his own document structure.

XML document example

Industry adoption

XML and Accessibility: the future

"The power of the Web is in its universality. Access by everyone regardless of disability is an essential aspect."

Tim Berners-Lee, W3C Director and inventor of the World Wide Web

Emerging XML-based languages with a promise of enhanced accessibility, such as:

  • SVG
  • XHTML 2

XML and Accessibility: multimodal interaction

Multimodal Content

The Dream

  • Adapting the Web to allow multiple modes of interaction:
    • GUI, Speech, Vision, Pen, Gestures, Haptic interfaces, ...
  • Augmenting human to computer and human to human interaction
    • Communication services involving multiple devices and multiple people
  • Anywhere, Any device, Any time
    • Services that adapt to the device, user preferences and environmental conditions
  • Accessible to all

The Multimodal Interaction Activity is extending the Web user interface to allow multiple modes of interaction, offering users the choice of using their voice, or an input device such as a key pad, keyboard, mouse, stylus or other input device. For output, users will be able to listen to spoken prompts and audio, and to view information on graphical displays. The Working Group is developing markup specifications for synchronization across multiple modalities and devices with a wide range of capabilities.

XML and Accessibility: requirements

For XML to enhance information accessibility, the following is required:

  • Content and Information that is well structured
  • Content Structures that are semantically meaningful
  • Multimodal interaction with the content


First transition to a full XML fileset with Daisy 2.02 (2001)
This recommendation uses XHTML 1.0, an XML reformulation of HTML:
  • Adds a significant simplification for playing devices
  • Adds future-safing
  • But does not add grammars authored for the particular purpose
First release of Daisy 3 (Z39.86-2002) (2002)
Introduces specific grammars for:
  • The Navigation Control Center (NCX)
  • The Full Text of the publication (DTBOOK)
  • Device-interchangeable bookmarks
  • and more...

Focuses in these grammars on the rigidity of structure, and a simple but semantically rich grammar for print books.

A structurally an semantically correct XML source document can be use to create many different output formats, such as:

  • A DAISY 2.02 Talking Book
  • A DAISY 3 Talking Book
  • Braille Print
  • Dynamic Braille
  • Large-print
  • E-text (ascii, xhtml, ...)

One XML master - reusability

  • for different output formats
  • for new editions of the same book

This DTD is purposively designed for the single source master concept.

Using XSLT and other automated or semi-automated transform processes, the source document can be prepared for the output destination, if needed at all (dtbook is in itself a good e-text format for example).

Strict XHTML 1.0 documents can also be upgraded to DTBOOK at a later stage using XSLT.

Access to the Source Document


Access to publishers files

Output format variations


  • Manual typing
  • Scanning with automated conversion into text (ascii, rtf)
  • Publisher files conversion

Primary benefits of XML

  • openness
  • wide adoption
  • grammar specificity and extensibility
  • information integrity
  • information longevity
  • focus on structure and semantics, not presentation
  • multiple uses, multiple modalities
  • usable by humans and machines
DAISYpedia Categories: 

This page was last edited by DAISY1 on Thursday, March 21, 2013 16:34
Text is available under the terms of the DAISY Consortium Intellectual Property Policy, Licensing, and Working Group Process.