Loving the Alien: Machine Learning and Publishing

Expert publishing blog opinions are solely those of the blogger and not necessarily endorsed by DBW.

machine learning, publishers, data, searchOver the past few weeks, Mike Shatzkin, Neil Balthaser and Ali Albazaz have debated whether machine learning systems will be able to predict a bestseller (Mike’s initial blog post is “Full text examination by computer is very unlikely to predict bestsellers”; Neil’s response is “Yes, Machine Learning Can Help Predict a Bestseller”; and Ali’s article is “Artificial intelligence and the art of reader-driven publishing.”) It’s been an interesting exchange about the value provided by today’s publishing organizations and the future of predictive analysis. That said, the growing deployment of machine learning systems raises larger questions for publishers that must be addressed soon, before publishers lose control over their intellectual property.

Publishing is a technology-driven business—a byproduct of the printing press—that has evolved in lockstep with advances in print and distribution technologies. While ebooks have grabbed most of the attention over the last decade, search has been and will continue to be the most significant technology driving publishing. To that end, the evolution of search is being driven by innovations in machine learning-based discovery and recommendation.

Our concept of “knowledge” is in large part derived from our ability to classify information. The Dewey Decimal Classification system introduced the notion of relative location. Before this system’s introduction, books were stored on library shelves in the order in which they were acquired. The Dewey Decimal system made it easy to browse the shelves, discover new books in a given subject area, and form connections that would not have been visible while browsing a chronologically organized collection.

A similar revolution in how content is organized and accessed is underway in the world of computing. To date, our access to content online has (for the most part) been dictated by taxonomies and tags created by humans: BISAC codes, keyword systems and user-generated tags. These ways of identifying content look backward. We file a book based on how it fits into our historically derived categorization system. In this way, we are using an essentially static (or “solid”) form of classification in a world where knowledge is increasingly dynamic (or “liquid”).

Machine learning systems combine recent advances in computing platform technologies such as networking, data storage, and processing, with advances in fields such as computational statistics, natural language processing, and sentiment analysis (to name just a few of the related areas of relevant research). These systems now have access to huge collections of content (“big data”) that go well beyond what any one person or team can process in a lifetime. They have the ability to analyze content, dynamically derive tags and keywords, and discover conceptual relationships between content elements in the data collection that may not have been evident when the source content was first published. In addition, the quality of the computer-generated results improves over time. Put another way, these systems now have the ability to learn.

Once again, publishing is a technology-driven business. Digital technologies—including e-commerce, ebooks and audiobooks—have created new business opportunities. Machine learning is no different. Computers that can learn can help sustain publishing. Enhanced discovery and recommendation engines can help sustain a diversified retail ecosystem by democratizing access, giving independent retailers (and publishers who sell direct) the ability to personalize recommendations. Computers that can learn can also expose relevant content that’s not easily found using the algorithms currently employed by the industry giants.

The major players in software development, search and retail, including Google, Microsoft, Apple, IBM and Amazon, along with a large number of start-ups, are investing in machine learning. These companies will be knocking on the doors of publishers to ask for access to their assets. They will try to sell content owners on the value of enhanced discovery—and some of them will deliver. However, there are also long-term implications of the growth of machine learning that require careful consideration.

The same systems that deliver book recommendations today will be able to deliver highly targeted answers to specific queries tomorrow. These systems will quickly evolve from recommending books, to delivering excerpts, to delivering machine-authored responses that synthesize information from a wide variety of sources.

Publishers need to understand the applications of machine learning and how these systems may evolve. They need to determine whether they can afford to let others decide the direction these systems take, or make their own investments (in-house or through partnerships) that give them more of a say. They need to determine the market value of helping teach computers to author content and determine how their own authors will be compensated for these new uses of their work.

For more than 500 years, the publishing community has taken advantage of advances in technology to enhance its ability to produce and disseminate information, knowledge and creative content. Machine learning has the potential to drive a significant expansion of our notion of publishing. While the field is in its infancy, it’s growing up quickly. Publishers need to understand how machine learning is transforming content discovery, how to effectively evaluate the quickly evolving range of partners and platforms, and how to craft deals that appropriately reward them for this new use of their intellectual property.


To get all the ebook and digital publishing news you need every day in your inbox at 8:00 AM, sign up for the DBW Daily today!

3 thoughts on “Loving the Alien: Machine Learning and Publishing

  1. Cliff Guren

    Are you interested in learning more about how machine learning technologies can be used in publishing? If so, please add a comment below. I’d like to know if there’s enough interest to warrant the development of a webinar and/or white paper. Thanks!

    Reply

COMMENT

Your email address will not be published. Required fields are marked *

*