Expert publishing blog opinions are solely those of the blogger and not necessarily endorsed by DBW.
Big data was one of Frankfurt Book Fair’s hottest topics of discussion this year among digital publishing professionals.
Viktor Mayer-Schönberger, professor of Internet Governance and Regulation at the Oxford Internet Institute and author of the recent international best-seller Big Data, spoke forcefully about big data’s relevance to publishers and the insights they can gain from it during the Contec Conference. He acknowledged the adoption challenges from an organizational perspective but seemed very confident about the future. According to Mayer-Schönberger, it’s less a matter of if than of when the publishing industry will meaningfully embrace big data analysis.
Of course, big data is not for everybody; size matters. Small and mid-sized publishers and retailers may not be able to capture enough data to warrant close analysis, to say nothing of the financial, technological and organizational capabilities required to undertake it.
Meanwhile, big publishers, distributors and retailers theoretically benefit from vast economies of scale when it comes to data, but taking advantage of this resource isn’t always immediately attainable. The very few that can afford to do so should invest in new big data technologies and bring in technical competences from outside, rather than rely on self-styled technology and data experts on an ad hoc basis. The temptation to wait for others to do it first and defer costs and investment until then is considerable—and often for good reasons.
Innovating with big data technology
Nevertheless, one publishing/distribution group showcased a brand-new approach to big data, showing that long-term thinking can be the key to departing from the waiting-for-others-to-do-it-first attitude and embracing a more proactive one. Vincenzo Russi, Chief Digital Officer of Messaggerie Italiane explained a new methodology for harnessing big data technologies to solve the complex data analysis challenges in short time and at moderate cost.
Messaggerie Italiane is the owner of Italy’s largest book distributor; its largest national online bookseller; and the country’s third largest book publishing group with an overall yearly turnover of about $620 million. Russi and his team were looking for new sources of information to inform their decision-making process–innovative, efficient techniques to gather data from disparate sources and to cost-effectively convert it into actionable insights.
Russi’s ingenious approach actually exploited state-of-the-art big data technologies, practices and methods to address sophisticated (but not big) data analytical challenges that would otherwise have cost ten times the investment and time. By leveraging cost-effective technologies powered by Moore’s law, scalable cloud capabilities and stacked services, big data technologies can also be excellent for “small data” analysis from a business perspective, at the same time inducing heavy cost-reductions, shorter time-to-deploy and more engagement of internal users.
Of course, most publishers aren’t nearly as big in their primary markets as Messaggerie Italiane is in its own, which means big data analysis is not only out of reach for them but an inappropriate goal to pursue. In fact, it apparently wasn’t even the goal of Messeggerie Italiane either; the company just wanted to address sophisticated analyses of “small data” in a more cost-effective manner, by using big-data technologies. Needless to say, the infrastructure they put in place is also capable of running analysis on “big data” too. (But this was not part of Russi’s presentation; he has promised to publish a long article on his project very soon on Tisp. Check it out it’s available now.)
Data analysis and big data
As Chantal Restivo-Alessi, Chief Digital Officer of HarperCollins, reminded at Digital Book World Conference in January 2014 “big data is a little bit like teenagers talking about how many girlfriends they have.”
When it comes down to actually undertaking meaningful data analysis, everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, and therefore everyone claims they are doing it, too.
Note that I’m referring here to data analysis and not big data. Frankfurt gave us the latest proof that many people are still very confused about the distinction between the two. Data analysis and big data analysis aren’t the same thing, as I had to explain it during the Q&A of the masterclass on creativity and commerce.
For many stakeholders in today’s publishing industry, any serious effort to analyze information beyond a few thousands rows of Excel data still feels like something very big and challenging. Yet that is exactly what many industries have been doing for years. In fact, book retailers and large publishers have been doing it, too.
Many already use database management systems (DBMS) and data visualization tools that allow for complex data analysis that cannot be done using very basic tools like Excel. DBMS don’t have anything to do with “big data,” even though they handle lots of data.
The simple fact that some forms of data analysis require a larger technology infrastructure and methods that are more sophisticated doesn’t mean the data being analyzed is “big.” It just means the analysis is somewhat complex.
Here are three key features – among others – that distinguish big data analysis from “just” data analysis (of what we might as well call “small data”):
- Big data typically consists of a very large volume of unstructured data that cannot be handled by standard DBMS or more advanced relational database management systems (RDBMS) or object-relational database management systems (ORDBMS).
- Big data is not only distinguished by the size of databases used to analyze it. The order of magnitude typically is the Exabyte. For your understanding if you stack up 1 Exabyte of SSD memory cards that hold 1 Terabyte each, you will end up with a pile that is higher than Mount Everest (do you really have so much data?). But beyond size, big data analysis is also characterized by the challenges related to the workflow and the effective services delivered to the adopting organization that make the standard DBMS/RDBMS/ORDBMS unfit for the purpose. A big data analysis system needs adequate data-centric processes from capture, ingestion and curation to search, modelling, analysis and visualization, not to mention other critical operations like storage, maintenance, sharing, transfer, security and availability.
- Big data analysis aims to gain the additional information derivable from analysis of a huge set of related data, as compared to separate, smaller sets with the same total amount of data, allowing correlations to be identified that suggest business trends. This excludes any hard, divide-and-conquer approach to make size more manageable, as it would lead to fewer chances to spot new trends (In other words, you don’t just divide your data into a thousand smaller data sets).
Not only are publishers confused, retailers also seem to mix up big data analysis and advanced data analysis, possibly just for the sake of simplifying communication with publishers. Kobo’s new white paper provides very useful insights on ebook reading data analysis, plus concrete examples of how publishers can take advantage of it. However, the white paper misleadingly suggests these strategies are related to big data, while they are actually just innovative analytical approaches to ebook reading data.
These aren’t analyses of exabytes of unstructured data at all. On the contrary, the data is very well structured indeed: for any given ebook, there are very interesting statistics to gather on reading behaviors. If you publish a thousand ebooks, you can run a report for these thousand ebooks with dozens of numbers attached to each one. Of course, producing these statistics entails quite a significant effort devoted to first processing the raw data. However, that can usually be accomplished with a divide-and-conquer approach (title by title) and by using a robust ORDBMS infrastructure and services. One just needs to keep track of a few aggregated statistics at title level. If Kobo uses a big data technology infrastructure to handle small data, in a comparable way as Messaggerie Italiane does, they could be more clear about it.
More precise terminology would help reduce the confusion in the publishing industry. Exploiting Kobo’s recommended analytical approach is certainly a very good idea for many publishers (I’ve personally overseen several projects with their reading analytics, and I’m very confident about the insights you can gain. Feel free to reach out if you need advices.), and it would also be useful if other large ebook retailers like Amazon, Apple, Barnes & Noble and Google were to offer similar ebook reading statistics and even share their own white papers. But none of that should not be mistaken for big data analysis.
The publishing industry should seriously think about smart data analysis beyond the old good Excel spreadsheet or the simple aggregation/segmentation of plain sales figures on DBMSs. And it should stop fooling itself with a buzzword like “big data” that has a very precise meaning outside this industry. Let’s call those practices “smart data analysis” instead (meaning smart analysis with “small data”).
After all, the kind of analyses publishers can undertake with less-than-“big” data sets can help most publishers smaller than the Big Five to gain valuable insights and trigger informed data-driven decision-making throughout their organizational structures. That makes “smart data analysis’’ a much more widely applicable practice within the publishing ecosystem. With or without big data technologies, the focus shall be smart data analysis, beyond just book-keeping, sales breakdown, growth rates and average prices. Data scientists can do a better job for you than in-house self-styled data experts.
Finally even if truly big data analysis can only be relevant for few big players in the near and mid-terms, the good news is there’s space for scalable big data technologies in several organizations. As Messaggerie Italiane has shown, big data technologies and methods can also be exploited to address smart data analysis on small data, while helping organizations improve efficiencies and strengthen their data-driven decision-making. Using scalable big-data technologies appears a very promising and innovative opportunity for more than just the few usual big players. As we used to say, the future cannot be predicted, but it can be invented.
And for those who can’t wait and already dream of truly big data today, consider how the Formula One McLaren team became the “McKinsey of Big Data” and what they are doing to manage busy airports like London’s Heathrow, oil and gas exploration, drug discovery, healthcare for kids and more.
What does McLaren have to do with books? Absolutely nothing. But Formula One racing had nothing to do with airports, oil drilling, pharmaceuticals and healthcare until just few years ago. The power of big data goes way beyond our present imaginations. Publishing is one of the most creative industries of the world–just think what could happen if we were to successfully integrate publishing and big data
One last thought: in addition to asking Ron Dennis to finally publish his autobiography with you, you might also want to invite the McLaren Applied Technologies team to your headquarters… Or maybe not–let others do it first, right?