Sound and Fury: Audiobook vs. Text-to-Speech

Emily WilliamsBy Emily Williams, co-chair, BISG Rights Subcommittee

“The Librarian of Congress has announced the classes of works subject to the exemption from the prohibition against circumvention of technological measures that control access to copyrighted works. Persons making noninfringing uses…will not be subject to the prohibition against circumventing access controls until the conclusion of the next rulemaking….

(6) Literary works distributed in ebook format when all existing ebook editions of the work (including digital text editions made available by authorized entities) contain access controls that prevent the enabling either of the book’s read-aloud function or of screen readers that render the text into a specialized format.”

Rulemaking on Anticircumvention

The Library of Congress ruling last week re-opened a controversy from last year over whether enabling text-to-speech capability (a.k.a. TTS, a computerized voice that “reads” the text of an ebook aloud) on a digital device like the Kindle constitutes copyright infringement. In February 2009, when Amazon released the Kindle 2 (oh those heady days!) with improved TTS technology, the Author’s Guild took a hard line position that this computerized voice rendering infringed on audiobook rights to any book for which it was enabled.

Amazon initially pushed back, arguing that TTS was an accessibility feature rather than competition for audiobooks, but the etailer was eventually forced to back down and allowed publishers to disable TTS on a book-by-book basis.

Audiobooks got me through the most boring job I ever held. I’m very fond of this form of “reading”, and I have to admit my reaction to the Authors Guild stance was a big fat “Huh?”

TTS technology has gotten a lot better over the past few years and it can sound almost like a normal person speaking, but the idea that it can replace a good audiobook still seems like a stretch. To me, it’s equivalent to saying that Google Translate infringes upon the translation rights of an ebook. TTS and Google Translate are good, pragmatic solutions for basic understanding of a text when no better alternative exists, but no one would ever confuse them with a nuanced adaptation created by humans.

The Money Question

My opinion aside, it seemed high time to let the audiobook producers weigh in on the subject, since they’re the ones whose livelihood depends on a future for the spoken word book adaptation.

On the question of whether he’d see any conflict in acquiring rights to a book for which TTS was enabled, Troy Juliar, publisher of Recorded Books, said his company has done so, though “many authors and author representatives are requiring that Amazon or anyone else offering it disable the function, especially if an audiobook is available.” Juliar is sympathetic to the accessibility issues that drove the original development of TTS, but points out the technology “was not intended to replace an audiobook or deprive authors of the separate stream of income and royalties that audiobooks provide.”

On the glass-half-full side, Hugh McGuire, founder of LibriVox (an open-source project to create public domain audiobooks), sees a potential financial boon for authors in TTS: “If the concern from authors or agents or rightsholders is that this is going to eat into audiobook sales, the flip side of that is that it’s going to speed up actual book reading time if there are people who are listening in this way, so they’re more likely to buy another book next week than if you block this ability. The cumulative effect of allowing people to consume books whenever they want, wherever they want, however they want, will increase their engagement with books and overall will be a good thing for the industry rather than somehow taking money out of it.”

McGuire is concerned about the road publishing has to tread into an increasingly digital and competitive future, and sees danger in dictating what readers can and can’t do with the books they buy. “Particularly this was coming from the Authors Guild,” he added, “to my mind this is an action where the authors are trying to block their readers from reading stuff the way they want to read it, and I just think that’s a bad game to be in. The Authors Guild should be doing everything they can to promote as much reading as people can possibly do.”

As far as the threat to audio publishers like LibriVox, McGuire is dismissive. “To me it speaks to a certain lack of faith in what audiobook producers are doing. If audiobook producers can be replaced by text to speech functions then I don’t think they’re doing anything all that interesting.”

High Performance

Donald Katz, founder and CEO of (now an Amazon subsidiary), which is both an audio publisher and the biggest vendor of downloadable audiobooks, sees TTS as irrelevant to Audible’s mission. “TTS may be a short-term utility for some,” says Katz “but it is not a challenge to our growing business at Audible. Synthetic speech is entirely separate from the continuous elevation in the quality of the performance of literature our listeners download every day. The tremendous skill of the professional actors and narrators who have turned audiobooks into a performance art is a large part of why audiobooks have proven so addictive for our members.”

When it comes down to it, Troy Juliar doesn’t sound all that worried either. “Since 80 percent of our titles are fiction, we do not feel terribly threatened by TTS,” he says, pointing out that in audio not all genres are created equal. “TTS may threaten business and self-help audiobooks somewhat, where listeners are simply aiming to hear practical information. TTS may be adequate for those purposes in some limited cases.” When it comes to “true storytelling”, however, Juliar is confident in his company’s model of hiring professional actors to interpret books.

“We sincerely doubt a listener will break into laughter or tears because of a TTS rendering of a work of fiction. The human ear is too sophisticated for that.”

One of those actors, Luis Moreno, who has an MFA from Columbia and has narrated audiobooks for Recorded Books, couldn’t agree more. “Wouldn’t you rather have Ian McKellen, or Jim Dale, or the author herself performing a book, than a human-sounding voice that never processed the thought behind the words? It’s like that robotic doctor thing, trying to express ’empathetic’ sentences when someone tells the terminal that they’re not feeling well.”

Indeed, Juliar emphasizes that for audiobook listeners it’s not just about an audio rendering of a text; they value “hearing an actor who is trained and talented enough to carry a story with his or her voice.” This connection to the voice behind the recording can be very strong: “Many audiobook listeners are deeply attached to the actors who read their favorite stories. Some narrators can fill an auditorium with hundreds of enthusiastic fans when they travel. Fans at these events often describe a curiosity and an emotional connection to the creative process of an actor interpreting an author’s work.”

This is a work of art in itself, the human connection to a story that good audiobook producers strive to create, as Juliar explains. “Many of our fans would feel betrayed if they thought their enthusiasm for and connection to an author’s work could be reduced to binary code.”

Emily Williams is co-chair of the BISG Rights Subcommittee and a former literary scout who currently works as an independent publishing consultant.

2 thoughts on “Sound and Fury: Audiobook vs. Text-to-Speech

  1. Kassia Krozser

    I’ve been arguing that TTS is an accessibility issue for years. For those who are visually impaired (or mobility impaired, for that matter), TTS is a poor substitute for technologies like JAWS, but those assistive technologies come at a price that isn’t easily affordable for everyone. And while I am not sure how JAWS parses EPUB files, or even if it can — it can parse standard HTML, of course — TTS on reading devices increases the pool of books available to readers. I noted mobility impaired above, and as we consider the aging baby boomer population, this type of technology will become even more critical.

    Audiobooks are entirely different animals, and like many of those quote here, I don’t see TTS functionality supplanting a professionally read text. TTS cannot pick up on nuance the way a human can. Perhaps someday this will change, but I am very pleased that right now, the Library of Congress has taken steps to make it easier for people to “read” the books they’ve purchased. The Authors Guild lost a lot of credibility with me when they chose to pursue a position that disenfranchised readers.

    1. Alternative Publisher

      What happens when text 2 speech technology gets better, and make no mistake the people behind devices with TTS want this to be considered a real alternative. Plus, if you think that some people, any number however small, will find it useful rather than buying the audiobook – there is a clear rights infringement in place. Sorry but I do not believe people buy e-readers to listen to synthetic voices so there should be no place for it – or more to the point the copyright infringement that entails



Your email address will not be published. Required fields are marked *