Challenges of Speech Recognition

True simplicity does not reveal the tremendous effort it requires.” attributed to Somerset Maugham

croppercapture109 3During the research phase of our Version 3 language solution, we posed what at first seemed like a straightforward challenge: we wanted learners to talk out loud more. Since 1995, our software had included speaking practice, but we found that many learners, either by choice or oversight, did not routinely use the feature. We wanted an order of magnitude increase in how much our learners talked. Of course, one can just turn up the amount of speaking required in the curriculum, but we found a number of challenges as we tested a series of prototype experiences with lots of speech included.

First, people remain self-conscious about speaking to computers; despite much progress in improving multi-modal interfaces and reducing speech recognition error rates, most people still do not talk to their personal computers. This is slowly changing as speech interfaces are being featured in everyday devices like cars and phones.  (Part of the reason they get mentioned in advertising is their novelty to many observers.) For most of our learners, their speaking experience with Rosetta Stone is the most they’ve ever talked to a computer in any language, much less the one they’re learning. So we needed a speech experience that was easy and consistent, that made speaking feel like clicking – a simple, intuitive response to a context presented in the curriculum.

Second, by default, learners seem to expect detailed feedback each time they are asked to talk. This is perhaps a natural consequence of the novelty of a speech interaction, but if not addressed it greatly constrains how much we can use speech in the product design. Imagine a helpful native speaker practicing a conversation with you, and after every phrase you say, they point out all flaws with it, in detail, and ask you to repeat it several times. While this undoubtedly would improve your pronunciation, it would quickly become so tedious that it would likely limit how much of this kind of practice you’d want to do in the future.  We needed to support a variety of levels of feedback during the learning process, from lightweight, nodding encouragement to the former kind of deeper analysis.

Finally, we are constrained by the limitations of the underlying speech recognition technology. As noted above, much progress has been made to reduce error rates over the past 50 years, yet significant challenges remain, especially in a context such as language learning. Early native speakers make a constantly evolving set of mistakes as they develop an ear for the language and get used to putting phonemes together in combinations they’ve never mouthed before. We needed a solution that would work for learners of any background through these stages of learning, and would work across 25 languages.

croppercapture111 2To find solutions within these constraints, we applied a variety of techniques, from design to technology to organizational changes. We founded a speech group within Rosetta Stone Labs, focused solely on delivering world-class speech recognition for language learning. This group is now a dozen people, including five Ph.D. researchers, and growing. They provide Rosetta Stone with our own core, proprietary speech recognition technology, which has allowed us to build our own models across many languages that have been tuned and tested with native and non-native speech.

We also found that by tuning up or down the visual prominence of the speaking task and the resulting feedback, we could set expectations in learners that allowed them to proceed through practice in the core curriculum with less need to slow down and perfect each phrase, but still provide other places where they could get rich feedback on their pronunciation.

Finally, we worked hard to keep the interface simple and consistent throughout all modes of interaction. Whether the learner is clicking the mouse, typing, selecting, or speaking, in every case the “game” is the same: completing a visual puzzle that has several missing pieces. By establishing a consistent metaphor across these modes, we were able to remove some of the oddity that learners initially felt when talking to their computers.

In external testing prior to launch, we were gratified to find that learners talked dramatically more often than in our prior courses, and more importantly, they found it completely natural to speak their new language to a computer.  We continue to search for new ways for learners to engage with speech through technology.  In future RVoice posts, members of the Rosetta Stone Labs speech team will explain how we’re continuously raising the bar to give our learners the best speech experience possible.

Find more posts about:

  • Petter Amundsen

    I find this feature indispensible. My mouth needs practice and I repeat all the sentences even if it is not required. Getting around the shyness of speaking out is the key. The privacy of your desk makes you comfortable. The tongue responds to repeated movements just like hands do. And you don’t get to be a musician by just listening to music, do you? With enough repetitions you cannot help but speak well. The only challenge is that the natives you may encounter often think you know more words because of your apparent fluency.
    Keep up the great work!

  • anita stout

    I can’t imagine the process without the speaking portion. I’d like to see a section where word pronunciation is stressed for particular words that are very common but are spoken too quickly for me to grasp in a sentence. For instance. The word Tiene I can’t hear it fast enough in a sentence to be able to pronounce it well enough to get past the “talk” portion without an embarrassing score. I’m in the high 90’s to 100% in every other category. I’m starting a blog…”so you think you can learn a language?” Any interest in advertising on it? I don’t want bogus ads from rip off sites since I’ll be talking about my experiences with your program.

    • rvoiceadmin

      Hi, Anita. Thanks for your comment! You might find our Speech Analysis feature helpful for practicing pronunciation. This feature allows you to listen to a native speaker’s voice at a slow speed and record yourself saying a phrase as many times as you’d like while comparing your pronunciation to that of native speakers. You can launch this feature on almost every phrase in a Rosetta Stone Version 3 product. To access the Speech Analysis screens, all you need to do is click on the Speech Analysis icon that is located on the image of the phrase you want to repeat. If the Speech Analysis icon isn’t active, you can click on the Answers icon to make the Speech Analysis icon active.

      Regarding your advertising question, please feel free to visit the Affiliates page on our website ( ) and click on Business Partnerships.

  • Karen C

    Dear Rosetta Stone folks,

    First, the praise: RS is a cool concept of a program, for the most part. I love the vocabulary sections most especially–helps me in terms of some of the words I’ve forgotten over the years, both the basic and the complex. I like how each lesson “builds upon” the previous ones; a truly ingenious way of teaching.

    However, the voice recognition is SERIOUSLY lacking. And I mean, seriously. I have messed around with the microphone settings to no end and it still doesn’t recognize when I pronounce a VERY basic word properly. I’m not an advanced French speaker, but I sure know more than enough French already (as in, before purchasing RS) to say the number “one” (“un”) or the word “cup” (“tasse”). I have been left frustrated and angry because RS refuses to pick up on words I know for an absolute FACT I am pronouncing perfectly.

    Ultimately, not a bad program, but I’m finding the speech recognition part utterly useless, and I didn’t pay almost $600 to not be able to use an integral part of this program.

    I’d urge RS to fix these bugs, as I know I’m one of quite a few who have tried RS and have had this problem. I wanted to like–no, to LOVE–Rosetta Stone, and hopefully attempt to learn other languages after I mastered French. However, $600 is far too much to spend on a program when one-third of it barely works. I will not be purchasing RS again.


    • Rosetta Stone

      Hi Karen, if you haven’t already, call us and do a sound check with us 1-800-434-8913. There may be a problem with your microphone that we can help you fix. We’d love nothing more than to help you try to fix this problem. Thanks for letting us know you’re having trouble!

  • chan

    The speech recognition is an okay idea. But hate to be brutally honest, it sucks. I have tried everything. I know basic spanish and can communicate. I purchased 1-3 to get better and become more comfortable. I am completely frustrated. The speech portion does not register ANY word. I paid way too much money to not be able to utilize this function. I am at wit’s end and am ready to return the product I just received in the mail.

  • Dee

    I was excited when I first started the Rosetta Stone Learning Series. I had some previous experience with Spanish, but I wanted to improve my pronunciation skills. Knowing I would receive feedback was an attraction for me. However, after Unit 3 of level three, I began receiving minor error messages advising me that speech recognition files were missing and that speech recognition would be disabled. I discovered that the problem occurs now in all of Spanish 3 and Spanish 1 lessons. I talked to technical assistants who did everything they could to fix the problem. Both times the technicians ended up sending me new language and application CDs. I tried to be hopeful each time, but was disappointed to realize that each CD came with the same error message. The only speech recognition I receive is that my microphone setup was successful. I am disapponted that there is apparently no answer to this problem. I hope there will be one in the future. For now, I am using multiple choice when I should be speaking.

    Good Luck,

  • Jim Sherhod

    I began using my French level 1 addition and loved every minute of the first two units however the speech recognition system is proving to be very frustrating in unit 3.

    When I was failing with even very simple words I asked my half French wife to try. Despite her beautiful French accent she is also deemed unproficient in pronouncing basic French according to your system! This suggests that there are grave issues with the system you are using….

    I love everything else about it and it is a real shame that this key component is letting it down. I assume that following the feedback received thus far on this blog you will be developing an upgrade to fix the problem that will be released to all of your customers? If not I feel there should be some form of compensation to help mitigate the frustration caused by the lack of functionality.

    • Rosetta Stone

      Hi Jim, part of the problem may be that the speech recognition is set for a specific accent. If your wife doesn’t have the same accent, she would run into problems too! You might want to try changing the speech precision level, which you can see under ‘set preferences’ > ‘audio settings’. Lowering that should help. If that doesn’t help you can set up a sound test with our support team: Just click on ‘submit request’. They can help you to narrow in on the problem. Either way, we should be able to get the speech recognition working for you soon. Thanks for touching base with us!

  • Dee

    I am happy to finally have access to speech recognition. It was a long time of frustration. Thank God, I finally found a level two technology assistant who could help me. I am using Spanish I and III. I have noticed that I have heard two pronunciations for the same word. I realize that there is one pronunciation used in Spain and another in at least some parts of Central America. When I didn’t follow the suggested pronunciation for one, I did receive an error for my pronunciation. When I switched to the pronunciation offered, I got the green light. I will let you know if that pattern changes.

    • Rosetta Stone

      Hi Dee,
      Glad to hear that your speech recognition seems to be working properly now. We do have separate language packages for Spanish from Latin America and Spain, so you will be marked according to pronunciation within the program you’re using.

  • Doyin

    Okay, the numbers 4 and 5 got me really angry. I’ve tried saying it again and again, but it keeps telling me I’m wrong. weird thing about it is, I’ve looked up how to pronounce and watched videos by native french speakers on how to say 4 and 5, it sounds EXACTLY the way i say it. The weird thing also is in a sentence like, for example, “Je mange trois sandwich”, I pronounce it right. I don’t know what’s going on, but in a sentence i pronounce the words right and yet, as a single word i can’t get even one right. This is really frustrating me.

    • Rosetta Stone

      Hi Doyin, you can always turn down the precision Level. That way, you will be able to get through saying the numbers more quickly. Then you can turn the precision back up when you’re going over sentences. Are you getting the answer wrong after you say the number, or before you even have a chance to get the word out?

  • Kornel

    I have to agree with many of the comments above who have concerns about the speech recognition in your product. Frankly, I expected it to be bad, because until recently the technology simply didn’t exist at a usable level. Now I purchased your product because when I call 411 and a computer perfectly recognizes what I am saying (even though I have an accent in English) and gives me the number, I have to say the technology has come a long way. Somewhere on your website you say you created your own speech recognition software. I think this is a huge mistake. Please consider buying it from a major player in this field, preferably the best one. I feel you are trying to solve which is easily one of the hardest problem there is in computer science. You will not be able to compete with the likes of AT&T, Microsoft, etc. Why not just buy this part from a company like that and concentrate on the rest of your otherwise brilliant product. I will call the number mentioned above to check my sound as your suggested. But I think my sound is OK..
    I still love RS though because it is still teaching me Spanish at a lightning speed compared to any other method.. And it’s fun. But I often have to skip the pronunciation, when out of 40 tries it doesn’t get what I am saying.

    • Rosetta Stone

      Hi Kornel,
      We’re sorry to hear that you are having difficulties with our product. Based on your comments, we think you might be experiencing a headset setup issue. If you haven’t tried already, please call us at 1-800-434-8913 and we can work with you to check your headset.

      Unlike other speech recognition technologies, ours not only can hear what you say, but also can measure how well you’ve said it. It also works in over 30 languages and across a number of operating systems and mobile devices—something that most existing speech recognition providers simply cannot match. And our commitment to enabling language learning through technology means we will continue to innovate along all aspects of language learning.

  • Al

    I have a couple of observations that relate to intermittent problems that I have encountered with the voice recognition function in Rosette Stone software. My experience is with all five levels of German and levels 1 & 2 of Italian and Latin American Spanish. The problems are similar in all three languages, but seem to be more of an issue in Spanish. First, I would mention that the microphones included with the software in all 3 languange sets were either defective from the outset or malfunctioned after a few days or weeks of use. I ended up purchasing a good quality headset at Best Buy for not too much money and have not needed to replace it in the year or two since. Now the problem with the voice recognition is similar to that reported by some of the previous bloggers. Specifically I encounter very simple words standing alone that the software simply cannot “understand”, even though it “understands” the same word perfectly well when it is embedded in a sentence. In trying to debug one particularly intractable problem of this kind, I went into the section where I could see the sonogram and hear what the software was analyzing. I discovered that the problem was that the first part of the word was being cut off. In other words the software is not picking up the first fracton of a second of sound. I discovered that if I grossly exaggerated the first sound in the first word, or if I “hum” for a fraction of a second before saying a word that starts with an unvoiced consonant, then the voice recognition problem is solved, although at the cost of speaking in an unnatural manner. I don’t know whether this problem is inherent in the Rosetta Stone software or is specific to my computer system. I am using a Sony VAIO laptop to run the software. A second problem with the voice recognition software, that seems to be specific to the Spanish version and is not so much of a problem with German or Italian, is that it doesn’t understand me well when I talk at normal conversational speed. It has no problem with my speech if I talk more slowly than seems normal. That may be a problem with my speech, but I haven’t noticed that Spanish speakers have much trouble understanding me when I talk at a normal speed.

    Those are just my observations, for what they are worth. On the whole, I find the voice recognition software in Rosetta Stone a very useful tool for brushing up my conversational fluency in these languages.

    • Rosetta Stone

      Hi Al, thank you very much for your thoughtful feedback. We’ll pass your observations along to the appropriate department. Out of curiosity, were all three language products Version 4 TOTALe?

  • Al

    My experiences as described above are with version 3 products. In thinking about the problem after making the previous post, I wondered whether just waiting a little longer after the ding before I started talking would have the same effect. I have since tried that solution, and it seems to work. My conclusion is that there seems to be about a second delay between the ding and the time the computer starts listening to what I am saying, at least on my system. The reason that I posted my experience is that it sounds to me like some of the other posters are having similar problems. I wonder whether this observation might solve some of their problems too.

    • Rosetta Stone

      Hi Al, thanks so much for posting and sharing your experience. We’re glad that you were able to solve the problem!

  • Emma

    I was given Japanese level 1 for christmas, and completed Core module 1 yesterday. I decided to give module 2 a go today, but for some reason, my microphone or recognition software is not working well today. I know some common words, like cat and dog (neko and inu), but the software is marking me wrong, whether i say it with my natural accent, a faux japanese accent, a normal speaking tone, extra slow and clear or even shouting. (I may have gotten a little frustrated). It is also not picking up on half the sentence I am saying (for instance, when I say “kare wa gohan o tabeteimasu” it only picks up
    “kare…gohan…tabete…” and gives me a poor mark, even though I know I am saying it correctly, and clearly.
    I am confused as to what is wrong, and I have tried starting later after the beep, and turning down the levels as suggested above.
    I don’t think anyone will be available to call today, haha, but I’ll keep restarting my computer, and the program, and see if it improves.

blog comments powered by Disqus