Q&A with DeepZen

As we head toward Digital Book World 2021 this September, we'll look at a few companies playing important roles in the future of publishing. DeepZen is one of those. A Q&A we did with them earlier this year is below.

1. What is Deepzen?

DeepZen (www.deepzen.io) is a UK-based AI company that has developed ground-breaking AI voice and natural language processing technologies that replicate the human voice to create a listening experience that is virtually indistinguishable from the real thing.

Our mission is to make high quality audio content accessible to everyone, removing barriers, and bringing the amazing benefits of our cutting-edge technology to the widest possible audience.

DeepZen’s innovative Text-To-Speech (TTS) technology helps publishers to produce high quality audiobooks; and corporate audio content producers, and content creators, to produce a range of audio content, without the time and cost restraints of traditional production methods.

DeepZen’s AI voices are licenced from voice actors and narrators and produced using next generation AI algorithms. They capture all of the elements of the human voice, such as pacing and intonation, and a wide range of emotions that produce more realistic speech patterns.

DeepZen produced the world’s first digitally narrated audiobooks to be sold across major audiobook vendors and distributors, including Apple Books and Google Play.

Our new Publisher Portal provides a high quality, convenient, and cost-effective production service that converts text into audio format, in approximately half the time it takes with traditional studio production, and at approximately half the cost.

Our Voiceover business creates high quality voiceover content for a wide-range of companies. We provide two service options: a managed production service, and an API service for customers who prefer to manage their own production.

The company’s goal is to extend the availability of audio content across the world, in different languages, and to help creators and businesses employ the advantages of its technology to scale up.

2. How did the company start?

DeepZen was founded by friends and entrepreneurs Taylan Kamis and Kerem Sozugecer in 2018. Both had been working in big tech and they shared a passion for technology, AI and machine learning.

The interaction between humans and machines was of particular interest to Taylan. Although the perfect replica of the cognitive part of human intelligence is still a way off, Taylan believed that you could build systems that could perfect some tasks and make life easier, more accessible, and fun for people.

Inspired by the 2013 Spike Jonze film “Her” (where a man develops a relationship with an AI virtual assistant), they wanted to see if they could create a process by which a machine could understand a piece of text and then read it out as a human would, with all of the emotional understanding and natural speech and intonation patterns of a real person.

The goal was to build a system that would generate long-from audio, and the founders quickly identified audiobooks as a clear use case, tailoring development to create a unique solution that would overcome the challenges traditional TTS has when used for long-form content, namely a lack of context, intonation and emotional quality. In 2018, they started working with a team of AI and Natural Language Processing experts from around the world, to develop a proprietary system that could read text and convert it to audio.

Development work extended into voiceovers, with considerable demand from companies in education, entertainment, content production and synthetic media. DeepZen extended its solution to support these industries and also started to provide an API solution with a real time feature to create business models that were not possible before.

3. Share with us the impact DeepZen’s technology has had on the audiobook production and publishing market, with one particular example of a client either increasing speed to market or reducing cost.

DeepZen’s technology has transformed the production of audiobooks by providing a high quality, convenient, and cost effective production service that reduces the time it takes to convert a manuscript into an audiobook by at least 50%, as compared to traditional methods, meaning publishers can get their books to market in under three weeks.

DeepZen is also currently working with two of the Top Five publishers in tackling real life challenges and employing its technology to solve real problems. Last year, when the pandemic struck, and print galleys were inaccessible, locked in warehouses or unable to be shipped from China, both publishers were looking for a cost-effective way to produce audio galleys that didn’t compromise on quality. DeepZen stepped in to provide a cost-effective, high quality solution. The pandemic, and the success of the audio galleys has prompted both publishers to rethink their long-term strategy and look at increasing the use of digital galleys, which are not as reliant on shipping and world events.

4. Branded voices are becoming more of a focus for companies engaging with voice technology and conversational AI. What should marketers of any company, whether publishers or not, be thinking about with regards to creating a branded voice for their own organisation.

There are four important criteria that companies producing voiceover content should consider when deciding on a branded voice. These are: 1) Availability 2) Continuity 3) Flexibility 4) The emotional capabilities of a synthetic voice.

Companies need to be able to produce voice content quickly and often at short notice. Traditionally, they would need talent to be available anytime and at a moment’s notice, when a news event happens, or a new directive lands. The beauty of AI voice technology is that it enables companies to replicate a talent’s voice without the talent needing to be present. This means that while they are away filming half way across the world, their synthesised voice, based on recordings of their own voice, can be used to create new audio content without their presence. Only AI, synthetic voices can provide this level of flexibility.

Companies who want to produce voiceover content should make sure they use the best AI systems, capable of editing and controlling the voice so that it includes the proper intonation, pacing, and range of emotions that make it sound lifelike. People do not want to hear the robotic, monotone, voices that we’re all familiar with. There is a need for real, natural sounding voices that people can connect with.

5. What new features or developments does DeepZen have planned for the rest of the calendar year 2021

To increase our range of voiceover options, we’ve just launched an app-based solution which brings two new product offerings to market.

The first, VideoMaker, is a tool which allows you to bring your presentation to life by adding digital voiceover to slides and then turning your presentation into a video in MP4 format.

The second, is a fast synthesis solution called VoiceMaker. This enables you to enter your text into the platform, choose your voice, and create audio output quickly and without the need for platform integration at the back end.

We’ll soon be launching our Publisher Portal, which provides a high quality, convenient, and cost-effective production service that converts a manuscript into an audiobook, or a company text or whitepaper into audio format, in approximately half the time it takes with traditional studio production, and at approximately half the cost.

We are also expanding into providing a high quality real-time service with our “viseme” feature and streaming services, which will allow us increase the range of options we can offer to companies with voiceover requirements.

We will also start to support Spanish, French and Portuguese languages later in 2021.