Expert publishing blog opinions are solely those of the blogger and not necessarily endorsed by DBW.
It might as well be time to address the elephant in the room. The pachyderm that is causing fear, uncertainty and doubt among authors, agents and publishers is the prospect of how data, and reading data in particular, may affect the creative process of writing, editing and marketing books.
My starting point shall be the data-smart publishing workshop I led at the Digital Book World Conference in New York last week. One of the attendees was publishing reporter Alexandra Alter, who reported her impressions of the issues for the New York Times in an article titled “Moneyball for Book Publishers: A Detailed Look at How We Read.”
Her article also included expert graphics by Karl Rusell that used data provided by reader analytics company Jellybooks (that would be my company). The title of the article was borrowed from the workshop presentation of Tommy Doyle, general manager for science, technical and medical books (STM) at Elsevier (RELX), who reported on how RELX uses data to be a data-smart publisher in the 21st century, playing “Moneyball for Publishers.”
The New York Times article has been discussed widely in publishing circles, and one of the key themes discussed among the literati matches a key topic of the workshop: a publisher’s struggle with using data on reading and sales versus relying on instinct or “Bauchgefühl” (gut feeling), as the Germans call it. Kornelia Holzhausen, director of digital media for Piper Verlag (Piper Publishing House) devoted a large part of her DBW talk to this dilemma.
Piper Verlag, headquartered in Munich, is one of Germany’s largest and most renowned fiction publishers, and part of the Swedish media group Bonnier. Piper was founded in 1904 and considers itself a “traditional” publishing house, with more than 100 years of literary history and tradition, wearing the labels “traditional” and “conservative” with pride.
Yet they were one of three German publishing houses (“Verlage”) to undertake a reader analytics pilot project with Jellybooks in Germany over the past 12 months. It was not an easy decision for them. It was debated intensely in-house. Marketing and sales was all in favor, while editorial was much more reluctant, even though Jellybooks had spoken about the technology at the company’s sales conference earlier in the year (many Jellybooks engagements with big publishers start with a sales conference presentation to show people in-house this strange, new thing called “reader analytics”).
One key question debated at Piper was, “Should we tell our authors?” Piper decided against telling authors about the experiment, as is true for every publisher Jellybooks has worked with to date. The fear is that poor data could cause authors to worry that the publisher might use the data to rewrite the book—that an author’s standing with their editor could be diminished. Thus Piper, like every other publisher to date (some of them I am not even allowed to mention), made the decision to keep silent and just go ahead with the experiment, and once they had the data in hand, decide whether to share it with authors.
In fact, the question of how you break bad (or good) reader engagement data to the author has still not been resolved.
Another concern at Piper was, “Is the data representative?” I think the latter question has now been answered by looking not just as completion rates (how many reader who start a book finish it), but also by looking at the velocity with which people read the book, the net promoter score (would readers recommend the title?), their comments on the novel and its characters, why the liked it or why they abandoned (!) it, and other data collected during the campaign.
Overall, a relatively detailed “profile of a book” is built up, and once you have results from 50-100 users that picture remains relatively stable. Adding more users (we have tested with up to 600) adds more granularity and precision, but the trends are clear after just 50 readers.
A key lesson from the pilot, and others like it, is that reader analytics in trade publishing (at least for professionally edited books) does not tell you much about the quality of the writing at the sentence, paragraph or chapter level. You don’t see people suddenly dropping off in chapter 13 or on page 217. Completion rates are mostly driven by characters, narrative, tone, language structure and similar factors. Readers judge a book as a “Gesamtkunstwerk” (a piece of art that has to be viewed as whole and not just as a collection of its elements). Readers make up their minds in the first 50-100 pages as to whether they like a book (Amazon, which has an immensely larger treasure trove of data, has confirmed the same finding to us). Thus, reader analytics helps with identifying whether a book resonates with a mass audience (high completion rate for a broad segment of users) and, if not, what sub-audience it actually appeals to (an overall low completion rate is often much better for a particular age group or demographic).
Every book has its audience, and one of the key outputs of reader analytics is the ability to find what that audience actually is. Marketing and publicity departments love the data. Editorial still fears it at times.
Editorial is gradually and slowly—very slowly—embracing reader analytics. Acquisitions editors have discovered that reader analytics can provide insights beyond a book’s sales numbers (Nielsen has been in the audience data business for a very long time). Was the content at fault or was it marketing? Reader analytics can highlight that a book that didn’t sell actually engages readers very strongly, but that the cover was poorly designed (the most common error), that a stronger book launched at the same time and overshadowed the book, that the book was wrongly positioned, or that timing was poor (bad luck, plain and simple).
Reader analytics also shows that books that sell well are not necessarily read. Titles that are literary prize winners might be put on the shelf to impress, but that doesn’t mean they ever come down. Debut novels make up another category that gets great reviews but the majority of readers can’t get into and therefore discard, thus explaining why the author’s second book often fails, as there is no existing audience from the first book that is hungering for the second. In this case, Nielsen sales data provided a false positive—a genuine data point that actually meant something else. Thus, reader analytics adds to the editorial knowledge trove of the acquisitions editor, helping her to judge the next submission or proposal more accurately.
However, this is a point that cannot be overemphasized: data helps shape decisions. It does not make decisions. There is no artificial intelligence that collects the data and decides a book will be published or not.
Every book tested by Jellybooks went on to be published, but in several cases the marketing budget for titles tested was increased or decreased based on the insights developed from reader analytics.
One of the key economic applications of reader analytics is to allocate precious marketing dollars on the right books. Marketing is like an accelerant: it is designed to create a critical mass of readers so that word of mouth can propel a book forward from that point. And reader analytics, more than anything, measures a books word-of-mouth potential based on the strength of engagement an audience has with the book.
Reader analytics also helps with packaging books. Covers raise expectations for readers, and the choice of cover design should motivate readers to pick up and discover the book, yet should not mislead and create false expectations, pushing readers to drop off after a few chapters disappointed that the was not the book they were expecting.
Book buyers who don’t read books don’t recommend books. And thus, the vital word-of-mouth promotion is snuffed out.
One of the most important arguments for reader analytics is that it helps identity the book’s audience. Should publicity and marketing reach out to readers who are young or old? Male or female? Urban or rural? Commuters or weekenders? Reader analytics offers unprecedented insights as to which audience and niche a book appeals to.
Because Jellybooks’s data deals only with ebooks, some pundits noted that the numbers reported in the New York Times piece may not accurately reflect how physical books are read. It is true that absolute completion rates for physical books may be indeed higher, as they are less likely to be forgotten or overlooked. But that misses the point. Reader analytics primarily measures the relative performance of one book compared to alternative books the reader might enjoy more. Therefore, many of the perceived biases are eliminated because they affect all tests equally.
We are off course in making a major assumption in that the relative ranking of ebooks corresponds to a similar ranking among physical books. But we have seen that in some cases this is not true. When a book is purchased to show off (usually physical purchases) or because it is assigned reading, there can be major format differences in completion rates. However, when books are bought for immediate or future entertainment, the factors are likely to be so similar that reader analytics collected on ebooks can be used to judge the appeal of a book across all formats. After all, it is still the same content. And incidentally, we have often seen little difference between iOS (iPhone) and Android (Samsung Galaxy, for example) even though they use very different reading apps with different ways of displaying books.
To put it simply, data is not your enemy. Data is your friend.
For a discussion of some the other arguments in the data vs. instinct debate, please visit my earlier post “Instinct Versus Data in Book Publishing” on Medium, or my many responses to comments on the New York Times article itself. May the conversation continue! There may even be a book in the works. Until then, you can follow me on Twitter at @arhomberg or post in the comments section below.
This is also a good time to mention that we will be holding a follow-up workshop called “Read on!” to coincide with the Frankfurt Book Fair this autumn. And there will also be an opportunity to hear about data-smart publishing at the 3rd European Digital Distributor’s meeting in Madrid, which is organised by FANDE, the Spanish Federation of Book Distributors.
Most of the data and results reported in this post were based on results from trade publishing. We are currently undertaking a number of pilots in non-fiction that cover STM, educational and professional publishing, and hope to report on these at our Frankfurt workshop in autumn.
Earlier posts in the data-smart publishing series:
• “The Internet of Bookish Things”
• “Reading Fast and Slow – Observing Book Readers in Their Natural Habitat”
• “Start Strong or Lose Your Readers”
• “What Books Have the X-Factor? Measuring a Book’s Net Promoter Score”
• “Men Are from Mars, Women Are from Venus, But What About Readers?”
• “How Does Age Affect Reading?”
• “8 Reasons Why People Buy Books”
Note: All the data reported in this post was collected in test reading projects financed by Innovate UK. EPUB3 files were modified with candy.js by Jellybooks so we could record, store and extract the user’s reading behavior when using iBooks, Adobe Digital Editions and selected Android reading applications. We have since extended support to VitalSource and other reading applications with more to come. The data stored within the ebook file was extracted when the user clicked the “sync reading stream” button at the end of chapter or the end of the book. All users were informed about and consented to the presence of the analytics software.
To get all the ebook and digital publishing news you need every day in your inbox at 8:00 AM, sign up for the DBW Daily today!