Will an Open Web Liberate Reading Data?

Expert publishing blog opinions are solely those of the blogger and not necessarily endorsed by DBW.

data, e-reading, ebooks, analytics, readersAt last week’s BEA, it was announced that the IDPF, the standards body for ebooks and responsible for the current EPUB specification, is considering merging with the W3C, the standards body for the web at large. This would mean that instead of having its own standards body for digital books, the publishing industry would be a smaller part of the much wider online publishing world, including magazines, newspapers, blogs, websites and more.

Some, like Peter Brantley, organizer of the annual “Books in Browser” conference, greeted the news with enthusiasm, while others, such as ebook wizard Baldur Bjarnason, were more skeptical. Dave Cramer’s (Hachette) response is also worth reading.

One of the reasons for the merger, according to Sir Tim Berners-Lee, father of the web, was the prospect of making ebooks trackable.

“We should live in a world of linked data,” Berners-Lee said.

It is this issue that I would like to address in today’s post.

First off, what will change as a result of the merger, and will we suddenly have treasure troves of data at our fingertips? Well, probably very little will change in the near future. The merger has not yet even been agreed to, and any new framework emerging may take years to be proposed, drafted, discussed and refined.

In this context, it is worth noting that EPUB 3 was introduced more than five years ago and is still not used by all publishers. Most egregiously, it is not even used by leading reading applications yet, such as Nook by Barnes & Noble, Tolino, the leading ebook app in Germany, Aldiko (owned by Feedbooks) and Bluefire Reader, though the latter two companies have announced plans to support EPUB 3 by the end of this year.

In other words, technological change in the book publishing industry does not happen fast. This will not come as surprise to any insider. Even outsiders know that book publishing is a very conservative industry.

But let us fast forward several years and imagine that books now live in the “open web.” Does that mean they will now be much easier to track, because instead of reading books inside Kindle, Nook and Kobo, we will be reading them in “normal” web browsers, such as Chrome (Google), Firefox (Mozilla), Safari (Apple) or Internet Explorer (Microsoft). You already notice that we are exchanging one set of technological giants (Amazon and Apple) with another (Google and Microsoft).

Also, maybe books will not be read on the “open” web, but in the closed web that is Facebook. If that were to occur, Facebook would be the new gatekeeper. Today Google is already unable to track and analyze what happens inside Facebook, so as brave as the proposal by Berners-Lee is, it may not make books really more trackable if they merely moved from one walled garden (Amazon) to another (Facebook).

But let us assume they really live on the genuinely open web of browsers like Chrome, Firefox, Safari and Internet Explorer and not inside the walled garden of Facebook. Now, you could surely use Google Analytics to understand the reading behavior of users? Well, not so fast.

Other forms of content, like audio (Soundcloud) and video (Youtube), live on the open web, and Google Analytics gives you only limited insights. The download statistics are still owned by those who host the content, i.e. Soundcloud and YouTube. Books are no different. Most readers get their books from booksellers like Amazon, B&N, Kobo and others, not publishers (there are notable exceptions like tor.com, harlequin.com and lostmyname.com).

Tracking technologies such as candy.js by Jellybooks would be much easier to deploy in such an environment, but you will still need this sort of customized tracking technology to measure how media is consumed unless it’s small snippets of webpages. Long-from content of more than 10,000 words (and that means books) will still require dedicated analytics tools.

It’s also worth pointing out that Google Analytics primarily deals with how users navigate from one webpage to another (see, for example, this lovely cartoon from xkcd). Google Analytics doesn’t deal with things like reading engagement, whether you scroll and flip pages, where and when you pause, if you finish the particular book and what the audience for that book looks like. Book reading is still very, very different from browsing the web. Google Analytics was developed for a totally different form of user engagement, which is searching for snippets of information, shopping for goods, navigating from one webpage to another, and will not help an author or publisher understand reader engagement. That’s why specialized tools are used in addition to Google Analytics by many who specialize in this area, and book publishing will be no different.

However, lets investigate this from another angle: what if the APIs and interfaces of the open web were available inside today’s EPUB standard? Many of these are crippled or unsupported in today’s EPUB 3.0.1 standard. This is something we could fix without a merger.

First, EPUB 3.0.1 removed the support for POST and now only allows GET. For Jellybooks users, this creates an inferior user experience. There are many cases in which a better user experience can be created for readers when one can extract and transmit data in browser using the POST command rather than the GET command. This is a feature of the open web that we should bring back to ebooks.

Two other features we are sorely missing to make books more trackable would be support for these two APIs:

Find out where a piece of the webpage (ebook chapters are webpages) is in relation to the viewport.
Note on why this is useful: You can establish what the reader (rather than the machine) is actually looking at, because you know what is actively displayed on the smartphone, tablet or e-reader screen.
visibilitychange’ event and ‘document.visibilityState’ property
Find out if a chapter/HTML file is visible to the reader.
Note on why this is useful: Ebook reading apps like iBooks aggressively pre-load chapters, so that companies like Jellybooks have to deploy all sort of “clean-up” algorithms to determine if a reader has actually starting opened and started reading a chapter or not

These would allow us to track how readers page through a book rather than just analyze when they open, pause or close a chapter. The latter already tells us a lot about readers, but we could gain even more granular insights if we could reach down to the level of the individual page. These are APIs that are part of the open web but not supported in today’s ePUB standard. We could improve the trackability of books immensely with such small tweaks.

Furthermore, one of the big problems with all this is that so many people in the publishing industry have limited bandwidth and resources (travel cost) and too little time to engage with standards bodies. A merger will not improve this situation. Thus, we in the publishing industry suffer from an underdeveloped ecosystem that we have abdicated to others and that is, as a result, now controlled by platforms like Amazon, Apple and others who don’t have that much economic or philosophical interest in books.

It’s a shame really, but let us make the best of what we have today!

Earlier posts in the data-smart publishing series:
“The Internet of Bookish Things”
“Reading Fast and Slow – Observing Book Readers in Their Natural Habitat”
“Start Strong or Lose Your Readers”
“What Books Have the X-Factor? Measuring a Book’s Net Promoter Score”
“Men Are from Mars, Women Are from Venus, But What About Readers?”
“How Does Age Affect Reading?”
“8 Reasons Why People Buy Books”
“Data Vs. Instinct – The Publisher’s Dilemma”
“It’s the Cover, Stupid! Why Publishers Should A/B Test Book Covers”
“Foreign Rights and Reader Analytics”
“The Great Amazon Page Count Mystery”
“Reader Analytics Is No Silver Bullet”

To get all the ebook and digital publishing news you need every day in your inbox at 8:00 AM, sign up for the DBW Daily today!

2 thoughts on “Will an Open Web Liberate Reading Data?

  1. Michael W. Perry

    Andrew, I’m sure you are just doing your job, but you’ve ruined my day with this news. My frustrations include:

    1. Books, whether print and digital, are among the few areas of our entertainment and learning life that aren’t made miserable not just by all-pervasive ads, but by nasty schemes that track what we do and target that advertising in a host of disturbing ways.

    Imagine, for instance, political ads that present a candidate one way to one audience and the totally opposite way to a different audience. That isn’t just irritating. It’d dangerous. It could also be our future. And how about tracking not just what books we read, but where we pause in our reading and drawing conclusions, valid or not, from that? Cue scary music. That makes Orwell’s Big Brother look radicallly libertarian. His all-pervasive cameras did not track our eye movements.

    2. Here is a though question. Does Ford take its largest, turbo-charged diesel pickups as a model for its subcompact cars? No, it has more sense than that. Yet I’m picking up hints from your article and elsewhere that this merger intends to make ebooks simply a variation of websites. That makes no sense. Since webpage content is short, people do not mind scrolling through webpages. Digital books are far longer, so users not only prefer, they won’t tolerate anything but paging through them. Read. Page. Read. Page. That’s what they do.

    And yet, if you look at the existing standards and show some judgment about what these webpage gurus are likely to create if given control of ebook standard making, you’ll realize that they haven’t a clue about how ebook standards and ereaders should work. A turn-the-page-based interface has to be designed completely differently from a scrolling one. Ebook standards and ereaders need to know how to create attractive pages, yes pages, from the flow of text and images and do so intelligently despite widely varying display sizes. That is where the effort needs to be devoted. That no one involved seems to realize. What works for scrolling webpages, they seem to think, will work equally well for page-at-a-time reading.

    Not so. It’s like Ford adding a 50-gallon fuel tank to its subcompact cars under the assumption that a vehicle that gets 40 MPG needs the same fuel capacity as one that gets 8 MPG. Good technology must be designed to fit its particular use. A digital book is not a webpage and should never be treated as one.

    This is most depressing. I publish both print and digital using the tools InDesign offers. More and more, I find myself wanting to create attractive print and fixed-layout epub books (for tablets). More and more, I’m irritated by the ‘ugly doesn’t matter’ irritations of Amazon’s formats and reflowable epub. They do almost nothing to make the reader’s experience enjoyable. That I dislike. It’s like shopping for a subcompact car and discovering that none have trunks because that space is taken up by a grossly unnecessary 50-gallon fuel tank.

  2. Palessa

    I think to answer your question, it may liberate macro data but microdata, the data I and other authors have about our own books, our own readers, is still there but hardly anyone knows how to read if. If there were some standard methodology that said this person read x% of your book in y time but only read 0.5x% of your other book in 2y time then we could make certain logical leaps. So maybe it’s not the data that would be liberated but the methodology of how to better gather, read and understand the data that would finally come. I would love to have some standard where I, as a small fry, could gather my own data in a certain framework and read/understand/use it to get more attuned to readers more likely to enjoy my the stories I have to tell.



Your email address will not be published. Required fields are marked *