Yes, Machine Learning Can Help Predict a Bestseller

Expert publishing blog opinions are solely those of the blogger and not necessarily endorsed by DBW.

machine learning, publishers, data, big dataLast month, Mike Shatzkin wrote a blog post titled “Full text examination by computer is very unlikely to predict bestsellers,” in which he described how the claims of the creation of an algorithm that predicts bestsellers, as outlined in a new book The Bestseller Code: Anatomy of the Blockbuster Novel, are impossible.

While I agree in theory with Shatzkin that an algorithm alone cannot predict whether a book will be a bestseller or not, that isn’t precisely what The Bestseller Code claims, nor what our experience working with machine learning at Intellogo defines. What we aim to do is identify similar tones, moods, topics and writing styles to those books that are topping bestseller lists—as we can only do through algorithms—and, in this way, better understand the reading audiences’ desires. Machine learning allows us to do just that.

We use machine learning to read blocks of text from a specified set of sources, such as the books on a bestseller list, the books on a publisher’s forthcoming publication schedule, and potential acquisitions, in a matter of seconds and offers comparison and analysis (The framework of the search is defined by the user). Thus, the system can compare the current bestsellers, which represent current market interests, to any current titles a publisher is soon publishing in order to help identify where to focus marketing efforts.

Shatzkin states that we must consider the “consumer analysis, branding, or the marketing effort to promote the book” and claims that a better indicator of what might become a bestseller are the distribution numbers in the chain stores. But what is carried in a chain store is not dictated by market demand so much as the interests of the buyers. Any publisher that has received thousands of returns from said chain stores knows that system isn’t foolproof, either.

In this digital future, using machine learning platforms can provide publishers with opportunities to get real-time information about their readers, figure out what is working in the marketplace, and, perhaps, make the bestseller lists more of an accurate depiction of what readers want to read, not simply what is available.

Imagine a day when we take all our data about what people are reading and provide publishers (and authors) ideas of what people want to read, where to find those audiences, and better ways to reach them. This is the model that the film and television industries are already moving toward—with the help of Netflix and Amazon—so why shouldn’t book publishing take advantage of this market information? This type of decision support has not been possible up to this point, and publishers have often published books blindly, hoping that they would find the right audience and sell well.

Though “big data” can be a taboo subject when we talk about the romance of publishing, there are undeniable benefits to be had from using platforms that give publishers and authors information from which they can make informed decisions on how to invest their time and money.

To get all the ebook and digital publishing news you need every day in your inbox at 8:00 AM, sign up for the DBW Daily today!

9 thoughts on “Yes, Machine Learning Can Help Predict a Bestseller

  1. Sophie

    This frustrates me as I find that the books I enjoy reading are often unique and have different voices next to another, more predictable author. This includes bestsellers, so at the end of the day only an educated team of people dedicated to producing the best fiction can determine a bestseller.

  2. Max Myers

    Interesting in theory, but doesn’t allow for that most fickle of all factors, humans. Trends do not predictability make. Just ask Lennon and McCartney, Chandler, Hammett, Tolkien, Steel, Patterson, etcetera, etcetera.

  3. Christa D. Fairchild, Intellogo

    I think the message here is that a lot of what people enjoy, can be measured. Shatzkin refers to business solutions, hardly a reflection of the creative process. No one wants to take away from the creativity or craft that goes into writing. Outliers can never be anticipated.

  4. Neil Balthaser

    Some opinions on this piece may reflect a misunderstanding of artificial intelligence and machine learning. Intelligence like Intellogo are capable of seeing patterns in things which as humans we are unable to see. That’s the virtue of artificial intelligence. There’s no reason to fear this but it should not be discounted. While Intellogo readily recognizes patterns such as writing styles, tones, situations, themes, concepts, characters, plots. It also sees patterns which frankly we don’t see and may never see. That’s a fact. Can it be discounted? Sure. Should it be discounted? Probably not. Just as AI is being used in the medical field to seek out patterns for early disease detection which previously have been unknown, so the same computer science is being applied here.

  5. mqats

    The machine may become quite strong at forecasting bestsellers, but the question will remain, probably forever: How many false positives, and perhaps more important, false negatives, will occur?
    Also, in any enterprise of literary publishing, it ought to be considered more important to separate “break-evens” from “money losers.” For those who recognize no motivation other than getting rich, this may be hard to understand.

    1. Matt

      False positives and false negative can be collected in a confusion matrix and specificity and sensitivity can be calculated pretty easily. Fwiw.

  6. Barbara Miller

    Do you rate every book published or a select group for a select publisher? I’m sure some of this can be valid, but we should never discount the intelligence on the ground, the front line. I don’t think figuring out what readers want or will gravitate toward is so cryptic. There will always be a surprise flop or runaway bestseller. When I used to attend seasonal sales meetings the roomful of folks from around the country or a region could call what folks would or wouldn’t want to buy. The public let’s a store know what they purchase, what they can’t find etc. There is a lot more trickle up than folks talk about lately. Bestsellers happen with the folks who may not read as regularly or a demographically unifying subject. I do believe data can offer useful insights but is also as overrated as the smell of paper, this coming from some who reads on mobile though still prefers print when equally convient.



Your email address will not be published. Required fields are marked *