Expert publishing blog opinions are solely those of the blogger and not necessarily endorsed by DBW.
A funny thing happened when books went digital. The traditional publishing industry, whose livelihood had relied on the printing press for hundreds of years, suddenly found themselves in the software business. More specifically, as ebook demand took off, publishers found they were in the business of selling websites, because the two major ebook file formats (EPUB and MOBI) are, at heart, a “website in a box”—containing content archived in HTML and styled with CSS and, ereader permitting, leveraging a variety of other technologies and markup languages comprising the modern Web stack.
We’re approximately seven years into the modern “ebook revolution” now (as measured by the introduction of the Kindle), but I believe the fact that ebooks are fundamentally web applications still hasn’t fully hit home for many book publishers. It’s extremely tempting to instead treat EPUB and MOBI as if they were just another file export format, like a Microsoft Word .doc file. Such a conception fits neatly into the book-production paradigm that has prevailed for the past twenty or thirty years: you author in a word-processing application, typeset and design in a desktop-publishing application and finally convert/export the content for print. In this model, “producing” ebooks entails just another conversion, which can easily be outsourced as a postproduction afterthought.
But as I’ll discuss further in a Digital Book World webcast next Tuesday, at O’Reilly Media, we’ve spent the past two years exploring the deep implications of book content going digital, and of the central role that the HTML5 standard is now playing as canonical markup format in the marketplace of long-form electronic texts. We asked ourselves, “What should contemporary book production and workflow look like in a post-digital world?” We wondered, “What would happen if we re-architected our entire toolchain with HTML5 technology at its core?”
To answer these questions, we built Atlas, a next-generation publishing platform for authoring/editing book content, designing templates for that content, and one-click publishing to both print and digital outputs. Here’s a look at what we achieved when we looked beyond HTML as “just another output format.”
Of the HTML: HTML5 as manuscript source
One of the first questions we asked ourselves at O’Reilly was, “If HTML5 is the core markup format of our ebook outputs, what would happen if HTML5 was also the core markup format of our source manuscripts?” This question was quickly followed by, “Can we actually mark up our source manuscripts with HTML5?”
The initial answer was no, because HTML5 fell short in two key areas. First, it didn’t possess semantics rich enough to encompass the markup needs of a typical book (we needed a way to tag chapters, prefaces, appendixes, etc.). And second, it didn’t possess a rigid enough content model to standardize conventions for structuring book documents (e.g. how to consistently represent a book’s section hierarchy).
To address both these shortcomings, we developed HTMLBook, an open XHTML5-based specification for semantic tagging and structuring of book content. Using HTMLBook, we were able to mark up manuscripts right in HTML5, such as this excerpt from the first chapter of Alice’s Adventures in Wonderland:
<section data-type="chapter"> <h1>Down the Rabbit-Hole</h1> <p>Alice was beginning to get very tired of sitting by her sister on the bank...</p> </section>
Having manuscript files in semantically rich HTML5 immediately paid huge dividends for us. When both source and output files were stored in the same file format, heavyweight conversions disappeared from the workflow. Automating transforms to EPUB or MOBI output from our HTMLBook content was relatively trivial. Similarly, we were able to automate creating PDF for print or Web, with the help of a commercial formatter.
Atlas leveraged the automated toolchain we built for HTML5-source content in the form of a one-click build system. Creating a PDF, EPUB or MOBI from a manuscript became as easy as clicking a checkbox and a button:
With this build functionality, Atlas effectively eliminated any cost or time entailed in the ebook conversion process, making it possible to release content into the market early and frequently.
In our HTML5 source model, we were also able to enforce a separation between content and presentation. Structure and semantics were encapsulated in HTML markup, and book designs were maintained in separate CSS stylesheets, which could be applied to any HTMLBook manuscript. In Atlas, we created a series of themes from which users could select a design for their print and ebook outputs:
And thus switching the design of a book was as simple as applying a different stylesheet:
By the HTML: Web-based visual authoring
Building an infrastructure around HTML-as-source works most effectively when the stakeholders who are authoring, editing and/or producing the content buy into working directly with HTML content. Facilitating user adoption of Atlas evinced the need for a professional-caliber, user-friendly application for editing book documents.
Second, harness the advantages of a cloud-based platform to create an environment that fosters collaboration among all a book’s contributors, both by making it easy to invite others to join a project:
And by adding commenting features in the editor to allow contributors to annotate the manuscript:
The value of rich collaboration features cannot be overstated, as they not only ensure efficient coordination among contributors (no more emailing of files!), but also open rich possibilities for building community around content–e.g., having 100 university undergraduates collaborate on writing a textbook for a course.
For the HTML: Embracing ebooks as webapps
HTML5-as-source isn’t just about modernizing workflows, lowering conversion costs, and fostering cloud-based collaboration. It’s also about taking full advantage of Web technology to create truly next-generation digital content, whether that means making innovations in design, interactivity, or multimedia elements.
With Atlas, we also encourage publishers to “think outside the ereader” and to stop limiting the possibilities of what can be achieved in the ebook medium to the feature set that is currently supported by Kindle, iBooks or Nook. In addition to EPUB, MOBI and PDF, Atlas provides an output straight to HTML for publishing content to the open Web.
In February of 2014, we released Atlas into private beta, and in the coming months we plan to further enhance and refine the platform to continue to tap the vast potential of an HTML-based book production workflow.
If you are a publisher interested in joining O’Reilly’s Atlas beta and helping to shape the future of HTML-based publishing, please sign up for an invite at atlas.oreilly.com.
And if you want to learn more about next-generation digital workflows, join me for the upcoming Digital Book World webcast “Tailoring Workflows to Digital Content—XML and HTML5+CSS Production for Publishers” on Tuesday, March 25 at 12:00pm EST. Register here.