Login Register

The Blizzard Challenge 2008

Written by Gareth Halfacree

May 16, 2008 | 09:03

Tags: #programming #speech-synthesis #text-to-speech

If you're interested in helping advance the science of text-to-speech synthesis, you're needed as part of the Blizzard Challenge.

The Challenge is an annual event hosted by the University of Edinburgh's Centre for Speech Technology Research in which programmers are given 10,000 sentence-length recordings of a person from which they must create a working speech synthesis engine. Once each team has completed their engine, the results are uploaded for people like us to listen to and rate.

Speech synthesis is an important technology, and one which gets criminally overlooked in these days of multi-gigabyte storage and the ability to record voiceover artists in high-fidelity. Not only are the text-to-speech engines vital for partially sighted people using screen-reader software that all too often sounds like a cross between Stephen Hawking and a Dalek, but an engine which is as flexible as a real human voice holds the promise of massively improved immersion in games with vast swathes of text being transformed into realistic speech without the need to hire actors and expensive studios.

In order to make things easier for the teams involved, the Blizzard Challenge has traditionally used a neutral voice for the basis of the engines – one without a particularly strong accent and as emotion free as possible. This time round, however, the Challenge is to create a working engine from a voice sample which has a lot more personality than usual. While this makes things a lot harder for the programmers, it holds the promise of an engine capable of producing a voice that doesn't sound permanently bored.

If your last experience of text-to-voice synthesis was with the Say program on your Amiga 500 Workbench floppy, then you'll be pleasantly surprised by how advanced some of this years entries are. If you want to participate, you can sign up on the project homepage. It'll only take about an hour of your time, and it's well worth it.

Do we have any partially sighted visitors relying on screen readers, or are we all just looking forward to seeing the technology to a point where it can be used to put more speech into games like Oblivion? Share your thoughts over in the forums.

Discuss this in the forums