Voice Demo

Voice Demo

Explore the most natural and diverse text to speech voices on the market!

X Close

05/02/2013 - CereProc's Wonder Emporium of Text-to-Speech - I Said What?!

There is a more sinister use of text-to-speech technology than the traditional IVR and accessibility applications - to alter what a person has actually said. Dangerous, huh?

In 2004, The UK Daily Telegraph music critic Neil McCormick called Auto-Tune a "particularly sinister invention that has been putting extra shine on pop vocals since the 1990s" by taking "a poorly sung note and transpos[ing] it, placing it dead centre of where it was meant to be."

A bit of background...Unit Selection synthesis works by creating a database of speech - a library of sounds and audio recordings. The closer the speech output we synthesise is to the contents of the database, the better the rendition is likely to be. So, if we ask the synthesiser to say something identical to a sentence held within the database, it will produce a perfect rendition, just like a recording. This makes it easy - given enough data - to modify a recording in order to change the message spoken.

Below are examples of this modification. Can you tell which is sentence was spoken by the person and which is synthesised?

Heather

I don't believe there is such a thing as too much joy.



I believe there is such a thing as too much joy.



Answer? The first sentence is a recording and the second sentence is synthesized.

Obama

It's gonna take bold and immediate action to confront this crisis.



It's not going to take immediate action to confront this crisis.



Answer? The first sentence is a recording and the second sentence is synthesized.

Dubya

I'm encouraged to see that Iraqi political leaders are making good progress toward forming a unity government



I'm discouraged to see that Iraqi political leaders are not making progress toward forming a unity government



Answer? The first sentence is a recording and the second sentence is synthesized.

William

I am guilty of murder.



I am guilty of kidnapping.



Answer? Both of the William samples above are synthesised.

It may seem that we have gone to the dark side, but modifying speech like this can be very useful. Often in automated caller menu or customer services for example, it is only a small part of a sentence that needs to be dynamically changed. It may only be a single number, a company name, or a male first name of a sentence. If you create a voice with the right database, the synthesis - speech output - you produce can be impossible to tell apart from real speech. From our research, we've seen that this level of customisation of automated text to speech systems can sometimes make all the difference to a customer or caller.

Here is an example changing small items of a stock phrase:



The original recording:



This approach is especially useful when no synthesis artefacts of any type are acceptable. This is typically the case where the current approach is to only to use pre-recorded speech, for example in most current computer games. By designing the database properly you can record many fewer cues and get the same high quality output.

The subject of games takes us to our next exhibit. Stay tuned for Whubat fubun! Hubavubing Fubun Wubith Lubangubuage.

In case you missed them, here are links to the Wonder Emporium Part 1, Part 2, Part 3, Part 4 and Part 5.

Back