Why Semantics? How Ebook Markup Is a Sign of a Publisher’s Health

Expert publishing blog opinions are solely those of the blogger and not necessarily endorsed by DBW.

DBW SpotlightThe theme this month here on Digital Book World is data and analytics. In my last post, I talked about how much data is hidden beneath the surface of books—like diamonds in a Minecraft world. Today, I would like to expand on one of the points I made in that article: that semantic markup is an important part of the publishing process.

As the title of this post plainly says, I see the presence and proper use of semantic markup in the production process as one of the signs of the health of a publisher. While this may not have been as true just 10 years ago, the publishing world has changed considerably with the general increase in digital media consumption. This is not just about the expansion of ebook sales since the release of the Kindle; this is about the fact that publishers are competing with many other content providers and media creators for the limited attention and dollars consumers have to give.

Publishers are in a sticky place: consumers are spending their entertainment and education budgets on more and more kinds of media, including subscription services like Netflix and Pandora; games on mobile devices, computers and consoles; movies; YouTube; music; podcasts; and more. Publishers need to be able to compete with these other media options and showcase the benefits of their unique content vetting and preparation processes, not just for print and ebooks, but for other forms of media.

This is where semantic markup comes into play. As I mentioned in my last post, publishers have a ton of content in their archives. While the focus of publishers in the past has mostly been on creating new content—moving on to the next new thing—there are many reasons why publishers can and should consider how they can get more use out of the content they already have.

This is not about the “long tail,” though. (Besides, the popular understanding of that might be wrong anyway.) This is about taking advantage of content sitting in archives, whether the focus of that effort is only on the blockbuster titles or whether it is on every title that hits the shelves.

What is semantic markup? It is a method of marking or tagging content inside HTML, XML, EPUB, etc. in a logical and understandable way. An example of this is a chapter heading. A chapter heading is beneficial. It means something. It marks the beginning of a section of content and often provides some context for what that content covers. However, if you don’t know that it’s a heading, how can you find that section?

Imagine you’re reading a book and all the chapter headings are designed as regular text and show up in the book like all the other paragraphs. That would be frustrating and potentially very confusing. Your eye looks at the design of the heading to know that it is a heading, and even to understand better what level that heading has within the structure of the book. Semantic markup allows computers to do the same thing—by marking the content in a consistent, usable way.

Of course, the semantics in a book go much deeper than just the headings or chapters. Semantic markup can be used to define all kinds of things within a book. For example, the charts and graphs in a non-fiction book could, if marked up correctly and consistently, be extracted by a tool and added to the book website for discoverability and to give readers access to the data in a different environment. Those charts and graphs could also be supplemented by the actual data tables that the author used to create them, allowing readers to dive even deeper into the data—all while engaging the publisher and author on the Web, not just in the book.

Note an important point about this example, too: the work is being done by a tool, not by a human. Semantic markup provides a structural backbone to the development of automated processes that reduce human effort but produce powerful results.

There are lots of other examples of how semantic data can help a publisher, and I could go into some more technical detail about how to set that up. However, my point here is just to say that the more a publisher begins to consider how its content is tagged semantically and the more consistent it is with that tagging, the more useful that information will be for the publisher and for its readers.

In my opinion, that is the future of publishing: not just publishing one book and moving on to the next, but developing all that content in a variety of ways while not expending a lot of additional energy in the process.

Is your publishing house healthy? Are you considering how your content is tagged and working on ways to give it more semantic structure?

To get all the ebook and digital publishing news you need every day in your inbox at 8:00 AM, sign up for the DBW Daily today!


Your email address will not be published. Required fields are marked *