Breaking it Down: the ePub 3 Spec

Eric Freese

Related: How Publishers Should Prepare for EPUB 3

By Eric Freese, Solutions Architect, Aptara and member of the EPUB3 Working Group

EPUB is widely accepted as the defacto digital format standard for eBooks, with its signature reflowable text that can be read on the greatest variety of reading systems (including the iPad, nook/nookColor, Kobo, and Sony readers to name a few).

The much anticipated, upcoming revised edition, EPUB3, will include new features that promise to greatly enhance the reader experience, such as embedded audio, video, and interactivity.

Meanwhile, publishers hold out hope that the new and improved EPUB standard will rectify the frustration with EPUB 2.0.1 files behaving differently on different reading systems (which have led me and others, to stress the importance of testing your files on every intended device.)

With speculation abounding since the introduction of the spec in the spring of this year, publishers have been waiting to see what’s really possible with EPUB3 — and what reading systems will support it.

To help manage expectations and alleviate confusion, I’ve provided a brief snapshot of some of the spec‘s new features, as well as what publishers can do to start preparing for them, and notes of caution as to what may, or may not, be available in EPUB3 reading systems.

Related: How Publishers Should Prepare for EPUB 3


There has been some confusion as to whether HTML5 and EPUB3 will work together. To set the record straight, HTML5 is the base language of EPUB3 (with some minor adjustments to allow for pagination and other reading behaviors). Since EPUB3 content is written in HTML5, the two will interact hand-in-hand.

EPUB3 reading systems must be able to process XHTML files written in HTML5. This doesn’t mean that web browsers will be able to display EPUB files, unless they are able to process the additional navigation information contained within the EPUB file. That being said there are some reading systems that are implemented within a browser environment.


The new baseline for style sheets is CSS2.1 with some CSS3 features added. This will provide much richer layout including multi-column layout, better font support, and directional printing, to name a few. Reading systems are NOT required to support CSS, but almost all of them do. One of the leading causes of frustration is the difference in CSS support between reading systems. In the current EPUB environment, many reading systems do not allow stylesheets within an EPUB file to override the system’s default settings. EPUB3 does not do anything to alleviate this situation and, in fact, might exacerbate it somewhat due to the additional capabilities that are possible. Reading systems also have the ability to implement their own proprietary CSS extensions, which would then be ignored by other reading systems.


Audio can be inserted into eBook files using the HTML5 tag. This is what Apple, Barnes & Noble, and Amazon have been using all along to embed audio in enhanced eBooks. Now, it’s simply part of the EPUB3 spec. Reading systems are NOT required to support audio, although many do. If a reading system supports audio, it must support MP3. In addition, support of MP4 AAC and media overlays (explained later) is optional.


Video can be inserted into eBook files using the HTML5 tag. Again, this is what has already been occurring. And again, reading systems are NOT required to support video. In fact, most of the e-Ink devices are not able to show video in a satisfactory manner.

One of my main bones of contention with the EPUB 3 spec is that there is no specified format that must be supported. If a reading system supports video, the spec recommends support of at least one of either H.264 (also known as MPEG-4 AVC) or VP8 video compression formats, but neither is required. Unfortunately, the spec also does not say that some other format is not allowed. Essentially, there is nothing to stop a reading system developer from implementing some other video format (Flash?). Whether that happens remains to be seen, but there is an opening available.

In the meantime, publishers are going to need to prepare videos in both formats to support the widest range of reading systems. As has been discussed in the past, this could lead to very large EPUB3 files, or different versions that target specific reading systems.

Media Overlays

Media overlay functionality was added to the spec to enable text and media to be presented in a combined manner. For example, highlighting text as it is spoken by the computer or as part of a soundtrack. In order to employ these overlays, special Synchronized Multimedia Integration Language (SMIL) files will have to be created.

Reading systems are not required to provide this functionality, but if they do, they should allow readers to skip or escape out of overlays. Overlays can also be used to provide text-to-speech functionality. The spec mentions the Pronunciation Lexicon Specification (PLS) and the Speech Synthesis Markup Language (SSML) as the means for providing assistance in generating synthetic speech, but does not require reading systems to use that information.


Scalable Vector Graphic (SVG) files have been allowed within EPUB files for some time. However, their use was limited, due largely to a lack of reading system support. EPUB3 now mandates that reading systems be able to process SVG within the eBook, including allowing users to select text and search within the content of the SVG files. The only portion of SVG that is not allowed is the animation capability.


MathML is part of HTML5, and therefore, it is part of EPUB3. Reading systems must be able to process the presentation form of MathML, but may also support the content form of MathML. I won’t go into a lot of detail here. But publishers that deal with mathematical and scientific content may be interested, as it will allow formulas to be included as part of the XHTML markup — rather than as images. This means the content will be scalable, among other things. It is still recommended that images of the formulas be included as fallbacks.

Foreign Resources

Foreign resources are pieces of content that are not a core media type. For example, PDFs might be considered foreign resources. I have seen cases where PDF files are incorporated into EPUB files. When this is done, at least one fallback (perhaps a plain text equivalent) should be included to allow reading systems that don’t support the resource to operate.


Scripting and interactivity is another of the most hyped new features of EPUB3. Once again EPUB3 gets this functionality through HTML5. Usually this means JavaScript but this is not the only option. While scripting could blur the lines between eBooks and apps, it should be noted that reading system support for scripting is NOT required. Furthermore, reading systems have the ability to place additional limitations on the capabilities provided to scripts for a variety of reasons, including security and processing capabilities.

That being said, publishers should be thinking about possible ways that content can be made more interactive and beginning to plan for creating those enhancements. However, they should also make sure that the reading experience is not adversely affected if a reader decides to turn scripting off, or if a reading system does not provide it.


EPUB3 created a Canonical Fragment Identifier (EPUBCFI) specification for creating and accessing various locations within the content. This allows very fine grained access to the content, even at the word or phrase level. The use of this spec could allow indexes to link to the exact word within the content. It is also the basis of a future inter-document linking spec due out in the near future.

Publishers should consider how best to create additional target IDs within their content to speed the linking process. The good news is that reading systems are required to be able to process EPUBCFI addresses, making them more interoperable.


EPUB 2.0.1 actually consists of 2 schemas — EPUB and DTBook. DTBook was intended to provide content to assistive systems for visually-impaired readers through Braille readers and other technologies. Because of the accessibility features within HTML5, it was decided that DTBook could be deprecated and the functionality rolled into EPUB. So technically, EPUB3 files are accessible ‘by design.’

Publishers should do everything within reason to ensure that all items within their content are accessible. This includes descriptions of all images and alternative text for MathML and scripts.


Hopefully this quick dive into the spec provides enough context for you to at least know what to expect as we move into the new eBook formatting realm of EPUB3. Going forward, there will undoubtedly be lots of new capabilities as best practices get solidified and reading systems become even more advanced. So stay tuned. The final membership vote is expected in late August or early September, and I’ll be reporting back with updates.

Related: How Publishers Should Prepare for EPUB 3

Eric Freese is a Solutions Architect with Aptara, which provides digital publishing solutions that deliver significant gains in quality, time-to-market and production costs for eBook publishers.

8 thoughts on “Breaking it Down: the ePub 3 Spec

    1. Sara Slack

      I would have to agree with you only to a certain extent.

      Although I think there is unfortunately going to be a glut of new ‘authors’ (the term must be applied in a more and more loose manner as time goes on) deciding to use these types of features and there will be HUGE amounts of overkill. Personally, if I wanted a multimedia experience, I would either listen to music…or watch a video…or read a book. At most I would listen to music whilst reading…but the sad inevitability is that so much will be crammed into one file that the ‘reading’ element of the ‘book’ will be lost. I know that Michael Lane has said that “The future of publishing lies in the necessity for it to become part of the wider field of communications media,” but really?

      The bit I would have to disagree with, is the prospect of utilising this type of file for non-fiction books. Imagine a how-to manual that gave you a video, step-by-step account of how to do something? Or a book about the latest pop music trends that could actually give you audio snippets as examples?

      So to just put my comment in a nutshell: Good for non-fiction, bbaaaad for fiction.

      1. Eric Freese

        I would tend to agree with both of you on some points. Adding these extra things solely because you can, is not a good reason. When we work with publishers we strive to make sure that the addition of media and scripting enhance the reading experience rather than interfere with it.

        I won’t say that media doesn’t belong in fiction works. I could imagine movie clips being added to a fiction work, just as paperback release are sometimes issued when a movie is released featuring photos from the movie. But again, the media should not detract from the reading experience.

        One thing I would like to see is the ability to turn off the things you don’t want. This would likely be done at the reading system level with the eBook files containing everything and the device/app removing (not presenting) the parts the person does not want to deal with.

    2. Roymond

      Not looking for audio/visual to be added to novels, but a new medium is emerging in transmedia, and even non-fiction has much use for it. Imagine reading the history of an era and hearing speeches by iconic personalities of the period.

    3. Francisco Martinez

      “Just reading a book” for some people means combining the plain text with some media, i.e. audio.

      I’m talking about blind and visually impaired persons here, and this is why EPUB3 looks so promising. No publisher is forced to add audio and video to the ebooks they produce using EPUB3, but those who produce books for blind and visually impaired persons are given the possibility of producing their books using the exact same standard that is being used for commercial e-books. Besides, commercial books produced using the EPUB3 specification may be accessible (readable for all) without any further modifications or adaptations.

      Looking forward to seeing it being implemented.


  1. Joseph Sant

    Are there any changes in the navigation capabilities of a book. It doesn’t seem like you have multiple navmaps or a navmap that can be changed dynamically (except to choose alternate resources when 1 is available). I am thinking about textbooks, since different instructors will typically specify a different way of navigating through a textbook. It would be nice if it was possible for a navmap to match different instructors’ navigation schemes instead of the author’s.


    1. Eric Freese

      EPUB3 does improve navigation capabilities. There is an ability to provide an overall table of contents, with the ability to hide some items from view without affecting accessibility. The spec also defines additional navigation elements for page lists and landmarks. Finally the spec allows user-defined navigation elements in order to support other semantic types of navigation. For example, a TOC could be set up based on an ontology that present the information in a hierarchy, no matter where the information is located in the EPUB.

  2. Lo Yuk Fai

    An off-topic question: Any idea when EPUB will get a proper annotation standard for bookmarks, highlights, text and freehand…?




Your email address will not be published. Required fields are marked *