Next-Generation Book Publishing: Of the HTML, by the HTML, for the HTML

Expert publishing blog opinions are solely those of the blogger and not necessarily endorsed by DBW.

A funny thing happened when books went digital. The traditional publishing industry, whose livelihood had relied on the printing press for hundreds of years, suddenly found themselves in the software business. More specifically, as ebook demand took off, publishers found they were in the business of selling websites, because the two major ebook file formats (EPUB and MOBI) are, at heart, a “website in a box”—containing content archived in HTML and styled with CSS and, ereader permitting, leveraging a variety of other technologies and markup languages comprising the modern Web stack.

We’re approximately seven years into the modern “ebook revolution” now (as measured by the introduction of the Kindle), but I believe the fact that ebooks are fundamentally web applications still hasn’t fully hit home for many book publishers. It’s extremely tempting to instead treat EPUB and MOBI as if they were just another file export format, like a Microsoft Word .doc file. Such a conception fits neatly into the book-production paradigm that has prevailed for the past twenty or thirty years: you author in a word-processing application, typeset and design in a desktop-publishing application and finally convert/export the content for print. In this model, “producing” ebooks entails just another conversion, which can easily be outsourced as a postproduction afterthought.

But as I’ll discuss further in a Digital Book World webcast next Tuesday, at O’Reilly Media, we’ve spent the past two years exploring the deep implications of book content going digital, and of the central role that the HTML5 standard is now playing as canonical markup format in the marketplace of long-form electronic texts. We asked ourselves, “What should contemporary book production and workflow look like in a post-digital world?” We wondered, “What would happen if we re-architected our entire toolchain with HTML5 technology at its core?”

To answer these questions, we built Atlas, a next-generation publishing platform for authoring/editing book content, designing templates for that content, and one-click publishing to both print and digital outputs. Here’s a look at what we achieved when we looked beyond HTML as “just another output format.”

Related: Tailoring Workflows to Digital Content—XML and HTML5+CSS Production for Publishers (Webcast)

Of the HTML: HTML5 as manuscript source

One of the first questions we asked ourselves at O’Reilly was, “If HTML5 is the core markup format of our ebook outputs, what would happen if HTML5 was also the core markup format of our source manuscripts?” This question was quickly followed by, “Can we actually mark up our source manuscripts with HTML5?”

The initial answer was no, because HTML5 fell short in two key areas. First, it didn’t possess semantics rich enough to encompass the markup needs of a typical book (we needed a way to tag chapters, prefaces, appendixes, etc.). And second, it didn’t possess a rigid enough content model to standardize conventions for structuring book documents (e.g. how to consistently represent a book’s section hierarchy).

To address both these shortcomings, we developed HTMLBook, an open XHTML5-based specification for semantic tagging and structuring of book content. Using HTMLBook, we were able to mark up manuscripts right in HTML5, such as this excerpt from the first chapter of Alice’s Adventures in Wonderland:

<section data-type="chapter">
  <h1>Down the Rabbit-Hole</h1>
  <p>Alice was beginning to get very tired 
of sitting by her sister on the bank...</p>

Having manuscript files in semantically rich HTML5 immediately paid huge dividends for us. When both source and output files were stored in the same file format, heavyweight conversions disappeared from the workflow. Automating transforms to EPUB or MOBI output from our HTMLBook content was relatively trivial. Similarly, we were able to automate creating PDF for print or Web, with the help of a commercial formatter.

Atlas leveraged the automated toolchain we built for HTML5-source content in the form of a one-click build system. Creating a PDF, EPUB or MOBI from a manuscript became as easy as clicking a checkbox and a button:

The Atlas Build dashboard

With this build functionality, Atlas effectively eliminated any cost or time entailed in the ebook conversion process, making it possible to release content into the market early and frequently.

In our HTML5 source model, we were also able to enforce a separation between content and presentation. Structure and semantics were encapsulated in HTML markup, and book designs were maintained in separate CSS stylesheets, which could be applied to any HTMLBook manuscript. In Atlas, we created a series of themes from which users could select a design for their print and ebook outputs:

Book design “themes” available in Atlas

And thus switching the design of a book was as simple as applying a different stylesheet:

The same book content, in two different stylesheets

By the HTML: Web-based visual authoring

Building an infrastructure around HTML-as-source works most effectively when the stakeholders who are authoring, editing and/or producing the content buy into working directly with HTML content. Facilitating user adoption of Atlas evinced the need for a professional-caliber, user-friendly application for editing book documents.

Again, we turned to HTML5 (as well as quite a bit of CSS and JavaScript, of course) as a solution, and built into Atlas a Visual Editor that authors, editors, and production staff alike could use to develop book content. We had two key goals in mind. First, create a beautiful interface for editing and producing HTML manuscripts:

The two previous paragraphs in Atlas’s visual editor. Buttons above allow you to style the text; add links, index markers, and interactive elements; and much more.

Second, harness the advantages of a cloud-based platform to create an environment that fosters collaboration among all a book’s contributors, both by making it easy to invite others to join a project:

In Atlas, type in an email address to add a new collaborator to a project.

And by adding commenting features in the editor to allow contributors to annotate the manuscript:

Someone likes this blog post!

The value of rich collaboration features cannot be overstated, as they not only ensure efficient coordination among contributors (no more emailing of files!), but also open rich possibilities for building community around content–e.g., having 100 university undergraduates collaborate on writing a textbook for a course.

For the HTML: Embracing ebooks as webapps

HTML5-as-source isn’t just about modernizing workflows, lowering conversion costs, and fostering cloud-based collaboration. It’s also about taking full advantage of Web technology to create truly next-generation digital content, whether that means making innovations in design, interactivity, or multimedia elements.

When building Atlas, we wanted to facilitate the creation of digital-first content by making it easy to incorporate custom CSS and JavaScript right into your project. To support multimedia projects, we built functionality into the Visual Editor to embed video:

Instructional video embedded directly into Atlas’s Visual Editor

With Atlas, we also encourage publishers to “think outside the ereader” and to stop limiting the possibilities of what can be achieved in the ebook medium to the feature set that is currently supported by Kindle, iBooks or Nook. In addition to EPUB, MOBI and PDF, Atlas provides an output straight to HTML for publishing content to the open Web.

O’Reilly’s Raspberry Pi Cookbook on the Web.

In February of 2014, we released Atlas into private beta, and in the coming months we plan to further enhance and refine the platform to continue to tap the vast potential of an HTML-based book production workflow.


If you are a publisher interested in joining O’Reilly’s Atlas beta and helping to shape the future of HTML-based publishing, please sign up for an invite at

And if you want to learn more about next-generation digital workflows, join me for the upcoming Digital Book World webcast “Tailoring Workflows to Digital Content—XML and HTML5+CSS Production for Publishers” on Tuesday, March 25 at 12:00pm EST. Register here.

Expert Publishing Blog

About Sanders Kleinfeld

Sanders Kleinfeld is Director of Publishing Technology at O’Reilly Media. He developed and continues to maintain the automated XHTML5-source toolchain for generating print and digital book formats used in O’Reilly’s Atlas platform, as well as the HTMLBook specification for authoring in XHTML5. Sanders also focuses on R&D around next-generation workflows and ebook content for O’Reilly and its publishing partners. He is the author of HTML5 for Publishers (O’Reilly, 2011).

3 thoughts on “Next-Generation Book Publishing: Of the HTML, by the HTML, for the HTML

  1. Barry Morten

    It’s good to see the publishing industry begin to take advantage of some of the options technology offers them. There needs to be an awareness of other potentially game-changing technologies out there. For instance, the WHY Code ( is focused on capturing the answers to the fundamental learning questions of a reader, automatically indexing and publishing information from a text in this format. This enables people to understand information far faster than under conventional methods of knowledge publishing.

  2. Lottie

    Weddings are large affairs and a lot is happening everywhere.
    Children and grandchildren can be shown the photographs of the wedding
    ceremony. They capture different, unique and creative images.



Your email address will not be published. Required fields are marked *