On Books and XML

Original Author(s): Markus Gylling

Horace’s Monument

Exegi monumentum aere perennius.

(Horatius(65 B.C.–8 B.C.), Carmina (Odes III); referring to his poems.)

Defining 'markup'

Markup refers to the sequence of characters or other symbols that are inserted at certain places in a text or word processing file.

Presentational Markup
  • Embedding of codes in the text expressing font, size, color etc.
  • Examples are MS Word and other word processing systems.
Descriptive Markup
  • The markup doesn't tell you what to do with a piece of text, it tells you what it is, describes it. Another term could be "labeling".
  • Examples: XML (and its predecessor SGML).
  • Descriptive markup was born in the world of publishing technology, and has many advantages for serious large-scale publishing.

Exegi monumentum aere perennius.

Markup can be used to say:
descriptive This is a quote, in Latin, from Horatius´ Carmina.
presentational Display this text in italics with Arial font.

The rationale for XML (1998)

Systems based on presentational markup...

  • are inflexible;
  • offers poor longevity;
  • offers poor reusability,
  • ...can merely be used to produce nice-looking publications.

Existing systems based on descriptive markup...

are too complicated and therefore costly to use (SGML)

The need to 'fix the web'...

The predominant use of an inflexible and messy presentational approach to information on the web had created an information body that in the long run was doomed to become equal to noise.

Who is responsible?


XML is a project of the World Wide Web Consortium (W3C), and the development of the specification is being supervised by their XML Working Group. A Special Interest Group of co-opted contributors and experts from various fields contributed comments and reviews by email.

XML is a public format: it is not a proprietary development of any company. The v1.0 specification was accepted by the W3C as Recommendation on Feb 10, 1998.

What XML is... and is not

XML is not much - this is its strength.

All XML does is provide a nice flexible internationalized way to label the elements of a data structure and ship them around with the labels attached.

(Tim Bray)

XML is a metalanguage, from which new languages are created.

XML can be seen as a set of basic rules that describe requirements for the syntax of the new languages; often called "XML Grammars".

Extensible Markup Language (XML) 1.1 (Second Edition) W3C Recommendation 16 August 2006, edited in place 29 September 2006

XML and self-awareness

kanji characters for jikaku, self-awareness

Via the emphasis on descriptive markup, XML allows for elements of information/text to become self-aware, i.e. hold information on their own nature.

Information self-awareness has semantically structural and ontological aspects:

Who are you?

[structural semantic] I am a table cell in the second row, third column, of my parent table

[ontological semantic] I hold a maxim, from a given timespan in the classic age, in original and translated language versions.

c self-aware statistics in a graphic environment

XML and the Book

An XML book, based on the fact that it builds on descriptive markup rather than presentational, can be the basis for multiple output formats - a polymorph embryo.

In the context of a library serving people who are print disabled:

  • Narrated talking books
  • Synthesized talking books
  • Braille
  • Largeprint
  • Various E-Text formats

... normally with a minimum of human intervention after the XML embryo book is complete.

In a general publishing context:

  • parallell publishing
  • effective reuse in reprint and edition scenarios

XML and the DAISY Talking Book

features that emulate properties of paper print reading

Efficiency and Efficacy; Usability
navigation; not sequential (back/forward) but based on semantics:
  • go to to the next page
  • go to page 23
  • skip all the examples, they bore me
  • get out of this table, it bores me
skim reading
note-taking in context
The right of every citizen to consume information at the same time as fellow citizens

features that exceed properties of paper print reading

Media-agnostic and multimedia presentations
text, audio, images, upcoming video; synchronized and in any combination, per user preference
resource-enhancements, self-identifications, based on jikaku
(Who are you? I am a maxim, consisting of...)
The library-wide-web
instant inter-book linking

Extending the Grammar

When authoring XML based on descriptive grammars, it is a very bad idea to ask an element to pretend to be something it isnt.


DTBOOK is an element set of approximately 80 elements.

Focusing on structural semantics, specifically elements of the print book.

The movement towards new content forms and new sectors of society requires methods for abstraction and extension.

A more dynamic relation to structure, semantics and mediatypes.

Example: XHTML 1.1 modularization

A standardized framework for structural/grammatical modules, that can be combined in different ways.

Four types of modules in the "XHTML-Family Markup Language":

  • Required XHTML modules. Has to be included in the documents self-definition
  • Other XHTML modules. May be included in the documents self-definition
  • Other W3C modules. Modules defined by other W3C specifications, that match the overarching rules of the framework.
  • Private modules. Also has to match the overarching rules of the framework.

Examples of W3C modules that has been included in the XHTML 1.1 framework:

  • SVG
  • MathML
  • SMIL 2.0

Yet unanswered questions

  1. How extensible should the grammar be? Two stakeholders:
       reading device manufactuers: predictability/control of behavior
       the information: structure/semantics/mediatypes that best represent the content
  2. How should a document define which extensions it uses?
       Reading devices need this information (?).
  3. What behaviors should be expected from a reading device regarding extended/unknown content types?
  4. How complex can the extension mechanism be?
    To a large extent, this depends on who does the extension:
       in a specification phase: complexity a lesser problem
       in a production phase: complexity a large problem

Overarching challenges in the development work

  • unknown semantics - undefinable behavior
  • "supersetting"
  • versatility vs complexity


XML is an widely adopted, open, non-proprietary extensible specification for text markup.

Its elementary strength (for text) lies in giving self-awareness (jikaku) to elements of the text. This is done by the distinct separation of descriptive and presentational markup.

Books in XML have a richness (from economic and usability perspectives) unparallelled in other authoring formats. The XML-based book yields numerous possibilities for enhancing the reading experience for everyone.

The new Talking Book is not necessarily audio-centric, but it is definitely XML-centric.

The new Talking Book emphasizes and recieves its strength from the jikaku of the text behind the talk.

The new Talking Book represents a movement towards format normalization, in that it is not based on a specialized format (four track tape, moon, braille) but rather on a format living at the center of concurrent mainstream technology. This has a (hopefully increasing) impact on the actuality requirement.

The Z39.86 committee is bringing Daisy 3 forward: XML grammar and media type extensions are two of the main focii over the next years (2005 and forward).

DAISYpedia Categories: 

This page was last edited by PVerma on Tuesday, July 13, 2010 00:31
Text is available under the terms of the DAISY Consortium Intellectual Property Policy, Licensing, and Working Group Process.