Analyzing the Author Earnings Data Using Basic Analytics
Last week, when I was interviewed on the Self-Publishing Roundtable about the Digital Book World and Writer’s Digest 2014 Author Survey, I was surprised to learn of the deep animosity many indie authors felt toward this survey and its reporting. For Phil Sexton, who was also with me on the panel, and for myself, this was a hard pill to swallow, particularly since we have been doing this work with the intention of providing the best information we could bring to publishers, self-publishing service providers, and authors (as our upcoming reports will do more directly).
Now this week, along comes indie author Hugh Howey with his Author Earnings report. I understand his project is being heralded, likely by the same indie authors who may not quite understand or agree with what I and the team at Digital Book World are doing, as the answer to the data problems in publishing. Howey explains in his report that:
Choosing which way to publish is becoming a difficult choice for the modern author. This choice has only grown more challenging as options have expanded and as conflicting reports have emerged on how much or how little writers can expect to make. Our contention is that many of these reports are flawed, both by the self-selected surveys they employ, the sources for these surveys, and, occasionally, the biases in their interpretation. Our fear is that authors are selling themselves short and making poor decisions based on poor data.
As I opened Howey’s report and data, I felt hopeful that he might be able to circumvent the issues we’d had with the Digital Book World and Writer’s Digest Author Survey, which is a voluntary rather than a scientific sample. Howey’s method of data collection overcomes some of the limits of relying on voluntary survey data and provides a new source of analysis and information that may be used to support or call into question other findings. At the same time, it introduces other limitations, some of which he clearly delineates and others that he does not.
Not everyone has the kind of training and expertise I bring to this type of research with my doctorate and years of research and teaching. What I would like to do here is take a closer look at Howey’s numbers, which he graciously provided in the raw, explain the limits, and provide you with more informed conclusions based on the statistics using widely accepted research and analysis techniques.
Interestingly, despite the posturing and vitriol on the blog posts and author loops, Howey’s data doesn’t contradict any of the findings I’ve reported here or elsewhere. This fact is actually quite depressing. We’d all like it if authors earned more; as a hybrid author myself, I know I would. And as a lover of books and reading, I genuinely believe authors are undervalued in society.
The Digital Book World and Writer’s Digest Author Survey has its limits, as I have outlined in my other blog posts: It is a non-scientific sample of volunteers, many but not all of whom responded to an invitation from Writer’s Digest to complete the survey. Nonetheless, the analytic methods are sound and the findings highly illuminating, especially compared to an investigative journalism piece which would rely on far fewer sources.
Trish McCallan, the author who interviewed me for the Self-Publishing Roundtable, presented the concern that the survey missed the success cases and, therefore, misrepresented what self-published authors were truly achieving. Howey had brought up a similar argument, faulting us for comparing the traditionally published success stories to the multitudes of self-published authors and failing to take into account the authors going the traditional route and stuck in the slush pile.
As a result, the indie authors, particularly the outraged on the Kindle Boards and Howey himself, were arguing that our findings were biased, skewed, or utterly worthless. I disagree with all of these assessments. As I explained, it would take an awful lot of success stories to move the numbers we had collected and change the story that we reported in a meaningful way—as you will soon see.
Howey’s numbers provide the clamored-for corrective of focusing on the successes, rather than on the broad spectrum of authors we have in our survey. Rather than interview authors or rely on a self-selecting group of survey respondents, Howey selected his sample based on sales-ranking success. He captured the Amazon rankings data for Kindle ebooks in the top-selling fiction genres of romance, sci-fi/fantasy, and mystery/thriller. His sample consists of a snapshot of nearly 7,000 ebooks (6,887 books in the data I downloaded).
The benefit of these data are that we know who was selected and who was not, unlike in the Author Survey where we don’t know how people selected themselves. Moreover, we can use statistical tests that we can’t use with a voluntary sample, since the sample Howey collected is representative of best-sellers on any given day. Yet the sample has its own inherent bias and selection issues, which are important depending upon what question you are trying to answer.
To be clear, Howey’s sample consists of the top ebooks on Amazon in three very populated genre categories. Opening my Web browser to the Amazon Kindle store today, I find that combined these three categories have 443,321 titles, although the same book may appear on more than one of these lists. Therefore, Howey’s data represent somewhere between the top 1.5% (assuming no overlap of titles) and 4% (assuming full overlap and using romance, the largest category, as the denominator) of ebooks in these genres. I would argue that far from being the “mid-list,” the way Howey presents it, these are actually and only the elite.
Howey makes a bunch of assumptions to get us from sales rankings to author earnings. Using data he and his friends collected from their own sales rankings, he extrapolates from the Kindle ebook rankings to the number of units sold on data collection day. Then, based on the kind of publisher and the price of the ebook, he further assumes the royalty rate the author receives. Finally, to get to yearly income, he multiplies the daily author revenue by 365. This final calculation assumes a constant rate of sales per day with no fluctuation and no difference in the sales trajectory of books based on publication date, publisher type, or being part of a series. Finally, while there are close to 7,000 books, Howey aggregates author revenue in his last set of analyses, bringing the total to 3,349 authors and giving them credit for all of their books in the top sellers, assuming all of their books are represented there.
I’m not sure that any of these assumptions are good ones, but I’m not going to challenge them for the purposes of this post. Just note that changing them could also change the outcomes of the analyses.
The Kindle ebooks rankings for the nearly 7,000 books in the sample range from Nos. 1 to 753,309, since the Kindle rankings also include books in other genre categories. To my considerable surprise, my own ebook, Kings of Brighton Beach Episode #1, might very well be included in Howey’s data file, depending on when the data were collected. As of this morning, I’m now at a rankings low of #368,105, still well within contention. Yet I’ve hardly made any money on my writing so far. In fact, I’m still in the red given my outlays for editing, formatting, and cover design.
While I might find my book in Howey’s data, his income analysis would have excluded me, since the rating I have does not equate even to one book sold per day. Yet these “zeroes” among the bestsellers in his sample were not included in the income findings or otherwise noted. That oversight equates to Howey ignoring the non-sales of 844 of the “best-sellers” in his data, including 9.7% of the indie ebooks and 6.7% of “Big Five” ebooks.
Since I have an LLC, I would have been among one of the “uncategorized” single-author publishers that Howey didn’t bother to categorize but which, based on my analysis of Bowker publisher data from 2002-2012, is almost certainly an indie author. Apparently, we weren’t “interesting enough” in the rankings to track down. Including these counts in the indie story substantially dilutes the results for indie success as would including the “zero” sales.
Howey has shouted foul for my similar analytic decision not to include unconfirmed zeroes in my analysis of income or to swap authors in and out of categories without clarity on where they belonged. So why didn’t he do it when given the chance? As we refine our analysis of this data in the coming days and weeks, we’ll find out I’m sure.
We’ll let his numbers stand and use them to tell the bigger story—which, if anything, should now be well-placed to show the striking advantages of indie publishing.
Faced with the choices of how to publish, what should an author like myself choose—especially if the goal is to make a living from my writing? What can I expect in terms of income?
While the Author Earnings sample examines the high end of earners, the results are depressingly consistent with the findings I’ve reported from the Digital Book World and Writer’s Digest Author Surveys for 2013 and 2014: While a few authors are making money from their writing, not that many authors make that much money.
In Howey’s data, 944 authors out of the 3,439 authors of the almost 7000 Amazon best-sellers (with estimated sales greater than one book per day) were estimated to have earned above federal minimum wage ($7.25*8 hours=$58/day) from their best-selling books on data collection day. This number represents more than a quarter of the top-selling writers in the selected fiction genres (27.42%), but it is an extremely small percentage of the writers in these genres with books for sale on Amazon.
As a college professor, I personally make more than minimum wage; so I am looking at long odds on having my writing income replace my job income. Assuming I had the choice, are my own odds of making a living at writing better if I go indie or with the Big Five or another publisher?
Certainly, there are more indie than Big Five authors earning above minimum wage in this daily snapshot (486 vs. 302), but to know the probability of hitting the right place on the list, we would need to know the distribution of publisher types across all of the ebooks in the selected genres. What we do know for sure is that there are more indie authors than Big Five authors. Since fewer authors make it through the Big Five gatekeeping process to begin with, it’s entirely possible that my overall probability of hitting a higher point on the list is far better if I squeeze through the Big Five gate at the outset.
What about how much money I will make? The table below presents the median, mean, minimum, and maximum estimated yearly revenue from writing from the authors in Howey’s sample. A statistical comparison of the means (for you data geeks out there, I used the post-hoc Bonferroni analysis in STATA12) suggests that best-selling indie authors did significantly better than the uncategorized authors and authors from small or medium publishing houses. However, there was no significant difference in indie author’s estimated income compared to the authors who came through the Big Five. In other words, the table shows differences between the groups, but if we were to draw similar samples on other days, there is a 95% or greater chance that this difference would disappear and is merely a random fluke.
The pattern changes if I only look at authors making $58 or more per day, in which case there are no significant differences in average estimated daily or yearly earnings between Indie, small or medium publishers, Big Five publishers, and uncategorized authors.
In both analyses, Amazon authors earn substantially more than anyone else on average. The big winners in the Amazon bestselling ebook story seem to be the authors who published with Amazon’s publishing imprints.
When so few well-known authors have rushed to embrace Amazon as a publisher due to their lack of print outlets, we have to wonder how much of the market is really represented in the figures Howey has collected. All of these numbers beg the question, what about sales outside of Amazon? If there’s no real difference in average Amazon income, do the Big Five or even the small and medium publishers pull ahead in author revenue with print and other digital sales with other retailers and sales channels? Does Amazon as a publisher fall behind?
The research raises as many, if not more, questions than it answers. This is why science is an incremental and iterative endeavor. What I find in these new data are confirmation of a number of the things we’ve already reported, and I’m left with greater faith in the integrity of the findings from the Digital Book World and Writer’s Digest surveys, for all of their limitations.
Even when we take a snapshot of successes, we learn many of the lessons I’ve been expounding based on my own studies and the Digital Book World and Writer’s Digest data. The findings are these:
- Authors and publishers face a hard market, and it’s not easy to sell a lot of books.
- Publishing is a segmented market. A very small percentage of authors are in a position to support themselves with their writing, no matter which publishing route they’ve chosen.
- Publishers don’t have a lock on the answers, and the contributions they make to author sales and income are increasingly in question, leading to calls for partnerships that provide greater benefit to authors.
- Self-publishing is making it easier than ever before for more authors to make at least some money, if not a lot of money, on their writing, but these authors are a small percentage of the whole.
For myself and others, I wish I had more optimistic findings that showed we could all share in an incredible gold rush, but the data are the data.
I welcome any questions about how I performed this analysis and my impressions of the data.