Why Is Text to Speech Only an Afterthought?

Expert publishing blog opinions are solely those of the blogger and not necessarily endorsed by DBW.

Why Is Text-to-Speech Only an Afterthought?I spend a lot of time commuting to and from work in my car, and I try to use the time wisely. I cycle through a playlist of podcasts every week, but I feel like I’m missing out on other types of content. Regardless of your daily commute, I’ll bet you’d feel the same way if you’d stop to consider the possibilities.

I’m thinking mostly about short-form content, such as website articles, whitepapers and other documents. If someone sends me a link or I discover an interesting article online, it’s highly likely I won’t have time to read it immediately. That’s why I typically save it in Instapaper or Evernote.

This approach has turned me into an article hoarder, as I have countless unread articles in both Instapaper and Evernote. So while I thought my problem was a lack of time at that moment, the truth is I rarely have time to read many of these things later either.

To its credit, the Instapaper app for Android has a text-to-speech feature built in. But the way it’s implemented tells me it was added as an afterthought. Sure, I can tap the “Speak” button and sit back and listen, but how useful is that when you’ve got a bunch of 2-4 minute articles stacked up and you’re trying to go hands-free while driving along the highway (or taking a walk, or running on a treadmill, etc.)?

Publishers sometimes talk of engaging with the consumer who’s reading their content while standing in the proverbial grocery store check-out line. Next time you’re in line at the grocery store, look around. Nobody reads like that. Some people have their phones out, but they’re probably scanning Facebook or sending a text message. Rather than heads-down reading, you’re more likely to see people with ear buds in, listening to music while they shop or wait in line. And let’s face it: nobody reads while they’re running or doing other strenuous activities.

So along with all those “send to” buttons for various social and “read later” services, why isn’t there one built exclusively for text-to-speech conversions that open up all sorts of new use cases for content consumption?

The service has to do much more than just transform text to audio, though, as there’s an important UI component that needs to be considered. The entire platform has to be audio-based, including voice commands. Picture an app on your phone that has all the voice command capabilities of Siri or Alexa, for example. Whether you’re driving or running, all you’d have to do is say things like “skip,” “next article,” “archive,” “annotate,” etc. The user should be able to manually create playlists, and the service should offer the option of automatically detecting topics and placing each article in a relevant folder (e.g., sports, business, DIY, etc.).

And don’t forget the social aspect and opportunities here. Using voice commands, I should be able to quickly and easily share an interesting article via email, Twitter, etc. Let me also keep track of the most popular articles other users are listening to so I don’t miss anything that might be gaining momentum.

One business model option is probably quite obvious: insert short audio ads at the start of each article, similar to the plugs I’m hearing more frequently in podcasts. And since the article topic and keywords can be identified before streaming, it’s easy to serve highly relevant ads that are closely aligned with the articles themselves—think Google AdSense for audio. Give publishers an incentive to feature new “send to audio” buttons on their articles by sharing that well-targeted ad income with them.

Doesn’t this seem like it’s right in Google’s wheelhouse? I suppose they’ve got bigger fish to fry, but this looks like an existing marketplace gap that’s just waiting to be filled.

This article first appeared on Joe Wikert’s Digital Content Strategies.

To get all the ebook and digital publishing news you need every day in your inbox at 8:00 AM, sign up for the DBW Daily today!

4 thoughts on “Why Is Text to Speech Only an Afterthought?

  1. Will

    There is an app for iOS that provides many of the points you make. It is called Voice Dream Reader. It can accept plain text files, PDFs, Pages and MS Word docs, non-DRM ePub files, and other formats. It can download saved articles from Instapaper Evernote, Pocket, Dropbox, iCloud, Google Drive, OneDrive, and Box and the web. It can utilize Apple’s built-in Siri reading voice, but also can use 186 other downloadable high-quality voices available for 30 languages. Articles can be organized as playlists and notifications made to let you know when a new article begins. It also utilizes iCloud sync so you can stop listening on an iPad, for instance, and pick up later on an iPhone where you left off. You can customize word pronunciation, control reading speed, word and sentence highlighting, highlight and annotate text and share/export highlights and notes, and much more.

    One area where Voice Dream differs from what you describe is that it does not offer Siri-like voice command control.

    I find it useful for the situations you mention. I use it to catch up on articles while at the gym or commuting. On long drives I’ve listened to ebooks from the Gutenberg Project.

  2. Michael W. Perry

    Thanks for the article. Like you, I make double use of my time driving, walking or working around the house by listening to audiobooks and podcasts. I like your idea for giving voice commands to players. It would not even need to introduce the complexities of Siri-like services. It could simply have a teach mode in which readers would say whatever word in whatever language they want to have the playing pause, skip ahead or whatever. That’d be both simpler and more accurate.

    Indeed, lots of apps would benefit from user-trained voice commands designed for each application’s particular purpose, so much so that Apple and Google might want to build the idea into their developer tools. A weather app, for instance, might display the weekly forcast when the user says \weekly forcast.\ No klutzing around for how to display that. And \Weather in Chicago\ would display the city’s weather for someone about to go there. Yes, Siri can do that. I just checked. But for some specialized functions, building the feature into a app would be a great benefit.

    Your mention of keywords brings up something that I’d be delighted to see added to Instapaper. That’s auto-generated keywords. ‘Is the word in the article’ searching generates too many false positives for searches. The auto-keywords would display more intelligence. A term that’s used only once in a 2,000 word article wouldn’t become a keyword. One that’s used five times would. In addition, those auto-generated keywords could be used to classify an article as politics, sports or whatever.

    If ads are used to pay for this, my advice would be to keep them short. I don’t mind 15-second ads before a video. Thirty seconds is a bit much, and I never hang around for a one-minute one. Those sponsoring those are wasting their time and money. The same would be true for audio ads. Don’t imitate network TV. Keep them short.

  3. Joe Wikert

    Will, thanks for the tip on Voice Dream Reader. I see they also have an Android version so I’ll be sure to check it out. And while it’s great that portions of this vision exist it’s the full, end-to-end solution I’m looking for, especially for ebooks that are DRM’d. IOW, in addition to what I described in the article, I want an ebook reader as well that has this functionality built-in, not a free-standing app that I need to redirect files to.

    Michael, yes, you’re right that the ads need to be short. Again, I’d liken them to what I’m hearing at the start of some of the audio podcasts I subscribe to where the ads are typically about 10 seconds long.

  4. Dianne@ Tome Tender

    I have to say, this is a fabulous post, BUT, I for one use my Kindles for reading books while driving, through my car speakers, while shopping, (earbuds), while working out, (earbuds), gardening, cleaning, etc. My workouts are not sissy stuff, we are talking MMA kickboxing style, weight training, etc. I adjust the speed/gender of the voice. I know several readers who do this! Simple belt and case hold it securely out of the way for home tasks. For shopping, my purse becomes the carrier (BTW, this is the ONLY reason I carry a purse). Sometimes people just need to get creative, newer cars have Bluetooth…no Bluetooth? One earbud will do, no laws broken…:) Same applies for documents!…Because sometimes pure entertainment is good.



Your email address will not be published. Required fields are marked *