Presentation: XML From DAISY Proposed For Worldwide Use

Original Author(s): George Kerscher
Last Revised: 2006-08-01
Author: George Kerscher, Secretary General, DAISY Consortium


XML is clearly the technical mechanism that should be used to represent information in the world today. However, there are many outstanding XML vocabularies and endorsing one or several of these is a difficult decision. The XML vocabulary defined in the DAISY/NISO Standard, named DTBook, is particularly suited for conversion of print books or PDF; it is designed as a conversion XML vocabulary to upgrade print or PDF into a system that can be used effectively by persons with print disabilities. This document explains the rationale for endorsing DTBook, defined in the DAISY/NISO Standard as an XML vocabulary that should be used in the provision of accessible content for people who have a print disability. It will explain the supporting systems, and it will describe the ongoing evolution of the standards.

XML is the modern, flexible specification developed for representing information effectively and meaningfully. It is a standard maintained by the World Wide Web Consortium (W3C). XML is central to all new developments in the production of information; Microsoft Office Suite and Open Office are relying on XML technology at the heart of their systems. XML is growing in the publishing community and is widely used in all sectors of Information Technology developments. Most importantly, XML enables the separation of structure and content from the presentation of information. Persons with disabilities require presentational systems that are tailored to their particular disability. Audio and braille presentations are essential for persons who are blind, audio and large font display are essential for persons with low vision, and audio with synchronized highlighted text is essential for persons with learning and cognitive disabilities. XML facilitates all of these presentations.

The DAISY/NISO Standard was developed to make print material accessible to print-disabled persons. The DTBook element set is at the core of the DAISY/NISO Standard for markup of a textual content file. DTBook was developed specifically to facilitate the rendering of print materials so that they are accessible to persons with print disabilities. The three most likely alternative element sets were evaluated and rejected for the following reasons: HTML was not found to be detailed enough to support our needs. Docbook was deemed too rigid. Since the print textbooks that are the starting point are so varied in structure, it is not possible to use a DTD which enforces a prescribed document structure, as Docbook does. TEI (Text Encoding Initiative) was found to be too complex. TEI was developed for scholarly use, and as such was expected to be used by highly trained and skilled persons. Even TEI Lite covers many areas that are of no use to our community and has far more elements than DTBook.

Document Type Definition Required

We concluded after much debate that we needed a conversion Document Type Definition (DTD); that is, an element set that describes the structure and content of a print book in sufficient detail to enable it to be converted to alternative formats. Once work on the DTD began, we concluded that it would make sense to begin with common HTML tags, as they were already widely used and their semantics are understood, and to augment them as needed. The final tag set contains many HTML tags as well as a good number of new tags developed for this application.

DAISY/NISO 2005 is Already a Recognized Standard

The DAISY Consortium is made up of organizations throughout the world. The focus of DAISY is the development of information technology that is fully accessible and highly functional. The intent is to move this technology into the mainstream of society.

The DAISY Consortium developed the "ANSI/NISO Z39.86-2005, Specifications for the Digital Talking Book" Standard. The American National Standards Institute (ANSI) and the National Information Standards Organization (NISO) have already endorsed this standard. Both of these are formal standards bodies recognized in the United States.

The DAISY Consortium, incorporated as a non-profit, charitable association under Swiss law, is the official maintenance agency for the DAISY/NISO Standard. This means the maintenance of this standard is a fully international collaborative activity. Participation on this standards committee is open to any technically qualified person who is committed to the maintenance and development of the Standard.

Extending DTBook to Support Mathematics and other Fields

A metaphor know as the "pizza" metaphor has been used in describing DTBook to clarify the use of the base tag set and additional modules. One can think of the DTBook tag set as being a basic cheese pizza, and modules as toppings that can be added as desired. The DTBook tag set contains the common elements found in textbooks and reading materials used in primary, secondary, and higher educational arenas. The over-arching structures such as front matter, headings for parts and other divisions within the body of the book, and rear matter are defined. Block-type elements such as paragraphs, block quotes, lists, footnotes and sidebars are also defined. Inline items such as emphasized text, acronyms, citations, footnote references and sentences are identified. A complete list of these can be found in the Standard. The point here is that the basic types of books can normally be represented in XML using this basic cheese pizza.


Now think of separate modules that can be added as toppings on a pizza. A drama module that would contain the vocabulary (elements) needed to mark up plays might be considered green peppers for the pizza. The modules that can be added to the basic structure depend on the information one needs to convey.

The W3C has used this modular approach as well. One topping we are in the process of adopting for our pizza is the work they have done with MathML:
This is definitely meat on the pizza!

The work conducted in the W3C paid close attention to the needs of persons with disabilities, but there is a lot of work, requiring significant resources, that needs to be done with accessibility tools to take advantage of math provided in XML in this way. The DAISY Consortium has a working group developing the techniques to add MathML to DAISY. To see this work visit the Project area for MathML-in-DAISY.

Other modules will be required as time goes on, such as for other scientific disciplines, dictionaries, music, and so forth. The DTBook DTD was designed to meet the majority of the markup needs in non-technical books. It was deliberately kept lean so it would be easy to learn and use. The other modules to be developed would only be used as needed, thus minimizing complexity for users. DTBook incorporates a simple XML mechanism, described in section 4.2.2 of the standard and in the DTD itself, for incorporating tags from other element sets as needed

DAISY Provides the Infrastructure for the Evolving Standard

Much more than the identification of an XML vocabulary must be in place to support an international standard. Guidelines must be identified that clarify the tags, their semantics, and their proper use. Samples must be available that demonstrate usage. Mechanisms for tracking errors, enhancement requests, and future directions for the evolving standard must all be in place.

Visit the DAISY/NISO Standard maintenance area to view the full range of support mechanisms in place.

Structure Guidelines, an Example of Supporting Materials

The presence of semantics, or the meaning of an element (one word in an XML vocabulary), is critical for a reading system to correctly present information to persons with disabilities. With print, it is the visual information that helps the reader to use the information efficiently. The eye can take in a whole page at a glance and the student learns to visually identify headings, sidebars, footnotes, etc. However a student with a print disability cannot "see" these visual cues and must accesses the equivalent information through a reading system that understands the "meaning" of the tags which represent the visual representation, and which can present different types of information differently depending on the student's reading requirements.

It is just as important, for example, for readers who cannot read print to know if they are reading a paragraph, a list item, or a block of quoted text. This semantic information is provided in the source files by applying the correct XML element. The publishers asked us to associate the description of XML elements with something familiar in the publishing industry. The Chicago Manual of Style was chosen, as it is a definitive reference work used in the publishing industry to identify visual elements and their usage in print.

In the "Structure Guidelines" each element is semantically defined, with reference to the Chicago Manual of Style, or another authoritative reference work. Examples of usage are provided with each tag.

The DAISY Consortium is asking each national organization and international body, such as the EU, to endorse the XML Vocabulary defined in the DAISY/NISO Standard wherever legislation or agreements are being developed to serve persons with disabilities:

  • We want a single worldwide standard for the provision of XML content to libraries or organizations serving persons with disabilities;
  • We want to see support for DAISY in mainstream eBook technology;
  • We want to see "save as DAISY" in publishing production products;
  • We want to see mainstream authoring tools recognize the importance of semantics in XML content.
  • We want to see greater support for DAISY in braille software;
  • We want to see greater support for DAISY in large print tools;
  • We want to see publishers produce publications that can be accessed and read by persons with disabilities;

Explanatory note: The DAISY/NISO Standard is a comprehensive multimedia standard. It includes a complete navigation model, also defined in XML. It provides a bookmark specification and other supporting components of the complete specification. DTBook is the XML vocabulary for the textual content document. It is this segment of the DAISY Standard that is the focus of agreements and legislation. With this rich source for the content, high performance information products can be delivered to persons with disabilities.

Momentum Behind the DAISY/NISO Standard

Currently, Members and Friends of the DAISY Consortium are embracing the XML in DAISY. It is being used as the foundation of production of content that has a text component. We are also beginning to see the endorsement of DAISY in national legislation.

In the USA, DAISY has been named as the standard for the National Instructional Materials Accessibility Standard (NIMAS). Furthermore, in the reauthorization of the Individuals with Disabilities Education Act (IDEA) publishers of elementary and secondary textbooks are required to provide DAISY XML files to facilitate the production of DAISY multimedia Digital Talking Books, braille, and other accessible formats. NIMAS not only identifies DAISY as the standard to be used, but it establishes guidelines for how richly marked up the publisher's files must be. A Web site has been established for Support for the DAISY XML Vocabulary in the NIMAS Standard.

National organizations are encouraged to endorse the XML in DAISY without any modification to the standard. This adoption promotes standards harmonization. Any country or organization is welcome to participate in the standards development process to improve the standard through the years.

We encourage the adoption of language that allows legislation and agreements to "roll forward." What we for see is the continued evolution of the DAISY Standard through many additional modular extensions. We also foresee the Standard progress in relation to advanced XML developments, such as RELAX NG, and W3C Schema, RDF, and in the Semantic Web. It is best if legislation and agreements can automatically move forward as the standards continue to evolve.

DAISYpedia Categories: 

This page was last edited by DAISY1 on Tuesday, October 12, 2010 18:01
Text is available under the terms of the DAISY Consortium Intellectual Property Policy, Licensing, and Working Group Process.