You must speak dozens of languages, right?

Whenever I tell a new person about what I do – research in Automatic Speech Recognition for Rosetta Stone – the first thing they always say is, “Oh, so you must speak – what, about 30 languages?”

croppercapture1171Well, not exactly.

My knowledge of foreign languages is similar to that of many Americans and Canadians. I took Spanish in school. I can hold my own in a simple conversation. I can read Borges in the original. Because I’m Jewish and I had a Bar Mitzvah, I can also read Hebrew and lately I’ve been learning to speak it with Rosetta Stone TOTALe. But that’s it, other than a few phrases I might have picked up while traveling.

So how is it that I can train computers to judge pronunciation in over 31 languages?

The thing I usually tell people is this: there are standard acoustic modeling methods we can apply to any language, as long as we have two really important ingredients.

Ingredient number one is training data. This means hours and hours – hundreds of hours – of recordings of native speakers in each language. These training recordings need to match the type of speech we’ll eventually be recognizing – will it be read speech, or spontaneous? What dialect are we expecting? What age will the speakers be? What gender? Mismatch in any these factors can have a profound effect on how well the recognizer performs. When scoring pronunciation, we also need many hours of recordings from nonnative learners of the language to serve as a reference.

And ingredient two is transcripts of those hundreds of hours of recordings. These can either be word-for-word annotations done by native speakers, or they can be close phonetic transcripts written by linguists who have an academic familiarity with the language. Along with these transcripts come dictionaries of word-level pronunciation and comprehensive lists of the unique phonetic sounds each language uses.

With these ingredients, we “teach” the recognizer to know the sounds of a language by example: through demonstrating thousands of instances of each sound, in context. For those of us who don’t actually speak 31 languages, this is where we come in. With the recordings and their transcripts, we run complex training routines that amass statistics and “learn” the characteristics each sound is expected to have. And with the pronunciation dictionaries, we give the recognizer an instant vocabulary – a relatively complete list of phonetic sequences that it should expect to see. Of course I am simplifying things – even transcription itself is a laborious process, with a subtle art to doing it right. But essentially the set of sounds and words and recordings are arbitrary. They can come from anywhere, from any language.

So, no, I don’t speak 31 languages. Not yet, anyway. But our recognizer does need many native speakers and transcribers – thousands of them! – for it to know how a student of a foreign language ought to sound.

Find more posts about:

  • Matt

    Wow… I didn’t think it’d be THAT hard… I mean, I’m learning Spanish and all I know is I hear the people speaking… and thats about it, but, wow.

    I can sometimes get the software to approve when I say “El” when I need to say “La” though, but hey, nothing is perfect. With the help of this speech recognition software, I’ve been told twice (so far) that I speak Spanish without an accent! Keep up the good work everyone at RS! This product is the best!

  • Jimzip

    Yeah that is pretty incredible. I was reading recently about how Google Translate works, and the reality is much more complicated than many assume. It’s not a simple matter of a grid of words and phrases, but the result is a very simple tool that does the task super easily. Just like Rosetta Stone, the user never sees the complexities behind the experience. This is good design.

    Jimzip 😀

  • David Lloyd-Jones

    Many years ago I spent a summer checking gas meters around Toronto for a company which subcontracted to the gas company, so learned the necessary minimum — I’m from the gas company… please, thank you, Where’s the light switch… — in 28 languages. Toronto is a very multi-national own.

    One day a girl met me at the door, I sized the place up, and I said in my best Urdu “I’m from the gas company. I’m here to check your meter.”

    She ran screaming back to her grandmother in the kitchen. “Granny, Granny, he speaks our language!”

    “Of course he does, dear,” she said. “He’s from the gas company.”

    -dlj. *

    * Currently doing Chinese for real, with Rosetta Stone as one of my tools. It’s a fine piece of work, a bit overpriced, but only one of the many things you need to get a hold on a language.

  • Ellen

    Love the story David! =D

  • Lindsay

    But, sir, the problem is, I’m always correct when using it, even I speak languages wrong on purpose, mistake bi and be, ta and ya or something, but still I am always correct when I shouldn’t be…

    • Rosetta Stone

      Hi Lindsay, have you tried increasing you precision level? This will make pronunciation exercises more challenging. Also, you should submit a web ticket through our support site: We can help you test your audio, and troubleshoot any issues with you. Thanks for letting us know about your problem!

  • Lisa

    I am not going through the learninge I bought the french 1-5 program. I am doing ok and keeping an 85% but want to do better. Can you tell me what the totale difference is? Do I still only have the option of one language or can I learn more than one? I know that you have native instructors that help but other than that isnt it more expensive and then you also dont have the cd’s to fall back on at home?

    • Rosetta Stone

      Hi Lisa, you have some great questions! Version 4 TOTALe is the product sold in the US, Canada, the UK, Japan and Korea. It is available both in disks and in a subscription format, depending on which language you’re interested in. Active online rights also include access to the mobile applications and Rosetta World, an online gaming environment. If you live in one of the countries that sell Version 4 TOTALe, you can upgrade from Version 3: You would still purchase Version 4 TOTALe one language at a time. With the subscription version, you would be able to access all the Levels at one time. With the disks, you would be able to keep them forever, and you’d get free access to Rosetta Studio and Rosetta World for the first three months. With a subscription, the online rights are included for the entire subscription period.

  • J.S Frankel

    Like most kids growing up in Canada (I was born in Toronto) I had to learn French and hated it. Hated the rote memorization, hated the teachers for the most part, hated my own stupidity, thought I’d never be able to learn any foreign language.

    Moved to Osaka in 1989. Learned Japanese using one text, watching TV, and listening. Did it so I could speak to my girlfriends in their language. It must have worked because I married my wife in ’97 and am now fluent in Nihongo.

    Interesting article. Not saying it doesn’t have merit–it does–but for me, the biggest factor was the motivation, the “want” and the desire to learn. I picked up Spanish from my late mother by listening to her speak–she was fluent in that language, along with French, German, Yiddish and had a very good working knowledge of Latin–and I wanted to learn from someone who knew. For me, desire is the biggest thing.

blog comments powered by Disqus