Look who’s talking: Speech in Mango

Look who’s talking: Speech in Mango

  • Comments 51
  • Likes

On a recent run around town with my wife to grab dinner and pick up one of the kids, a text message came in from my son. Not an unusual event in itself, but what made this message interesting is that my phone read it aloud to me — and I replied back with my voice.

Meet Voice-to-text, a new hands-free messaging feature coming this fall in Mango and one that’s quickly become a personal favorite. And after seeing it in action on my test phone on our drive, my wife looked at me and said, “I want that for my car.”

Voice-to-text works for both text and instant messages, and it’s handy even when you’re not driving since it can slash the time you spend typing—a good thing at times even considering the fantastic keyboard on Windows Phone.

But the feature really shines when being hands-free is a necessity, like when I’m driving. My car has Bluetooth built in, and my Windows Phone is paired with it. When I’m driving and a message comes in, Windows Phone uses the Bluetooth connection and car’s sound system to narrate the message and record my response (pausing and resuming music or the radio if needed). The “conversation” goes something like this:

WP: [music pauses]You have a text message from Cody Pardi. You can say read it or ignore.
Me: Read it.
WP: “When will you be home?” You can say reply, call or I’m done.
Me: Reply.
WP: Say your message.
Me: “In about 20 minutes.”
WP: [The phone transcribes and repeats the message] You can say send, try again, or I’m done.
Me: Send. [music resumes]

My initial thought when I used it for the first time was “this is a game changer” because it felt natural to use while driving without being a distraction. And it all just worked. In fact, I was so impressed with the technology I decided to sit down with Alex Perez Avila, a program manager for many of the voice features in Windows Phone, to get an inside look at how it all happens.

Speech dialog box

Alex works in the Microsoft Tellme team, which develops the voice recognition and text-to-speech technology found in a growing number of Microsoft products including Office, Windows, and Xbox. He told me that competing smartphones are adding some voice features, mostly for existing phone options. Alex and his team, meanwhile, wanted to create something seamless that felt natural for completing everyday tasks such as calling someone in your contacts list or finding a local restaurant. “We think this will set Windows Phone apart,” he said.

Windows Phone taps the Microsoft Tellme cloud service for voice recognition and transcription. “No one else has it,” Alex said, “and we think customers are really going to like it.” The service, he notes, has built-in ways to learn from itself and improve recognition and transcription accuracy over time–all without putting additional software on the phone. The feature, he says, “will just get better and better as more people use it.”

I mentioned to Alex that I noticed my Mango phone can speak modern-day abbreviations such as TTYL (“talk to you later”), LOL (“laugh out loud”), and even Smile (“happy smiley face”). I asked him if Windows Phone could translate those back if I spoke them while composing a text message. “Yep. We understand a limited set of key phrases and will transcribe them as abbreviations.” He demonstrated—and indeed it worked as advertised.

In addition to Voice-to-text, Alex walked me through several other Speech-related improvements on the way. In Mango, for example, Speech can be triggered even when the phone is locked by pressing and holding the Start button. You also have control over how and when text messages are read. By default, the phone reads messages aloud when connected to Bluetooth headset or stereo (which is how Windows Phone knows to read my text messages in the car).

Check out Pocketnow’s preview of voice features on the way in Mango.

There are some great new accessibility-related Speech features coming in Mango—using voice to forward calls and setup a speed-dial list. When Alex showed me these, I was impressed. In one very cool example, he stored a number in a speed dial location and then dialed it, hands-free. Other things you can use Speech for in Windows Phone include:

  • Making a phone call by name or nickname
  • Redialing a number
  • Calling voicemail
  • Searching Bing
  • Turning on the speakerphone
  • Starting an app while in a call
  • Navigating Maps

All these features put together makes voice an incredibly integrated part of Windows Phone in Mango, and I think will it set the bar for voice-recognition technology in a smartphone. To finish the story I started this post with, I told my wife that if she wanted that voice feature in her car she’d have to get a Windows Phone because her smartphone doesn’t do that.

“OK, fine with me,” she said.

Now that was something really worth hearing.

Bill Pardi is a senior consumer writer in Windows Phone Engineering

51 Comments
You must be logged in to comment. Sign in or Join Now
  • If I understand how this works on WP7, it's actually quite brilliant.  For all intents and purposes, it crowd-sources the learning and evolution of the speech-to-text capability.

  • JanSolo 2 Posts

    I noticed this feature just the other day driving home and was completely delighted once I "got it".  I was driving home from work when I heard the very familiar SMS tone, but surprisingly, a "call" was initiated to my car and my phone started talking to me.  Totally surprising and at first, I hung up and thought it was a bug with Beta 2 of Mango.  Wrong.  The next time it happened, I let it go through and much to my surprise, my phone read out the message to me and even let me reply without having to tap a word on my phone.  Honestly, if everyone had a Windows Phone 7 running Mango (and a headset or car with Bluetooth), the number of accidents due to SMS related would drop dramatically.  You guys might think of it as just another feature, but it's clearly more than that.  As a mobile product manager (albeit for Android software), I try to stay current on all mobile platforms  and because of this feature, I've put away my Android powered Motorola Atrix for everyday use and have made my Samsung Focus running WP7 my primary phone.

  • Okay, so, @JanSolo, as an Android product manager, what other aspects (either unique features or features that just happen to be done better) do you think would be useful in persuading folks to give WP7 a shot?  I'm convinced that the best marketing currently available for WP7 is the hands-on experience.  I have to say that being able to demonstrate the speech-to-text capability right there with someone else using a different platform would be very cool ;)

  • Bragic 2 Posts

    Love the voice commands one of my favortite features from the original release.  I didn't realize in Mango I can tell it to launch an app, "Play Angry Birds" and boom, it starts.  Very awesome.  But,I gotta ask...why no voice commands for music.  I think with Bluetooth in your car, being able say something like "Play Foo Fighters" would be pretty awesome.

  • GlenC 6 Posts

    Great stuff. What is the voice command to turn on Speakerphone in Mango?

  • I agree with Bragic.  Where is the Voice Control with Zune?!?!  Heck, the MS Sync System in Ford vehicles is AWESOME, I want this on my Windows Phone device.

  • Big question, can this (sending and receiving messages and calls) all be done while the navigation software stays in front?!? Because when driving and I get a call I don't wan't to mis my exit! I don't need my navigation to tell me to exit a highway or make a turn, those things I wan't to see on the screen of my phone navigation, while messaging and calling I don't need to so while driving, I only wan't them to hear! Please give me an answer thanks!!!

  • Can you tell us when we can expect this voice speech to arrive in other languages (Dutch)? Even if we use an other language are we able to use the english voice speech in the mean time, even if it means sending english messages to our friends, and speaking english to our phone? Thanks!

  • @Bragic, you can start apps NOW in WP7.  I do it all the time.  I can say, "Open Rise of Glory" and the game starts.  Alas, while you can say, "Open Zune" and it will open the player, you can't tell it to play a song.

  • Can we get a list of the "limited set of key phrases" which are supported? Does the list include a way to add punctuation or create multiple sentences? I've tried using the Mango beta to send texts such as "How far away are you? I'm 5 minutes away from the restaurant." without luck because it just creates one run-on sentence instead.

  • barts2108 65 Posts

    Will the voice commands be also triggered in a car when the car radio is on ? If that's the case, I hope there's a single setting to disable all voice commands

  • Razor 53 Posts

    I love the voice features coming in Mango, but I can't say I ever liked the voice being used by the phone =| Majel Barett would have been the perfect voice... sigh.

  • My car actually reads messages for me since 2007. The application is called Message reader and it is a part of the Windows Embeded Automotive or Blue & Me.

  • +1 for dandrayan's comment. While I haven't done exhaustive experimentation, if right now it's pretty useless for sending texts with more than one sentence. But it's possible we just need to figure out the magic words (or wait for actual documentation).

  • Its fab, I have been trying to use it a lot... But it needs a massive amount of work.. It gets a lot wrong currently

  • tsrblke 327 Posts

    @dandrayan

    don't have the Mango beta or I'd try this myself, but could it have something to do with pauses? When MS put speech to text in Office you had to get your pauses just right to get it to place punctuation.  It was never the right puncuation, but at least it was punctuation.

    My only other thought is they figure with all the spelling errors and weird abbrevations, no one cares about a run on text ;).

  • I had a similar experience driving in 3 lane congested traffic. My Mango Phone alerted me via Bluetooth that I had a text, It read it to me and I replied , all without my hands leaving the wheel. Kudos to MSFT!!!

  • Anyone knows if "Speech" works in different languages? When Mango comes out 4 real. Like, could I, who live in Norway do all that stuff with it in norwegian?

  • GoghUA 2 Posts

    Any ideas why I can't get this to work on my Mango (beta 2) phone?  I have the speech settings all enabled and "Read aloud incoming text messages" set to Bluetooth only, but it doesn't seem to work.  Phone calls over bluetooth work just fine.

  • KR 503 Posts

    Though i am still running 7932(Yes because i love my warranty) i already love the voice commands the way it is and just can´t wait for Mango to be unleashed to my phone,I just wish HTC didn´t screw up with the headphones of the HD7(the call volume is too low)

  • KR 503 Posts

    I know some people might be thinking this feature is useless if you have an acsent,But i just want to let y´all know that as long as you say the word right the phone gets it.I´m french(don´t mind my English name) but even when i try the voice commands in English they work perfectly and though i speak English perfectly,I have a heavy acsent in English.

    For this acsent handling issue Microsoft just took our beloved OS to the next level with this feature

    two thumbs up guys

  • mdoan300 10 Posts

    Um, yeah, this feature only seems to work for me when I'm playing around with it at comfort of home, but never when I really need it to work the most -- when I'm out and about. After a few tries on a few different occassions, I've completely given up on it -- it's great if it consistently works for everybody else, but it's pretty dang useless for me right now.

  • KR 503 Posts

    BTW,i dont know if in Mango its gonna work in all languages but the current voice commands work in French as well and i believe it will still work in Mango so i am tempted to say YES IT WILL WORK  in all languages supported by the phone

  • JanSolo 2 Posts

    @ScubaDog2011  I work with a number of Android developers, and the way I've convinced them that WP7 has something to offer is, as you mentioned, a hands on demonstration.  They love the webOS like multi tasking and the speech functions as well as the fluid UI.

    Microsoft absolutely needs to demonstrate features to both salespeople on the floor at mobile retailers as well as on their commercials.  It feels like Microsoft's marketing agency doesn't think that a feature war is the way to go, but it's features like these that make WP7 a compelling consumer platform, so Microsoft, please puff up your chest and show off WP7's features in your marketing efforts, please.

  • Thanks for all the great comments and questions. I'm tracking down some answers and will jump back in with comments soon.

  • acctman 2 Posts

    Turning on the speakerphone and Starting an app while in a call... how do you do those two?

  • To preface these comments, I LOVE LOVE LOVE the TellMe stuff in Mango.  It's natural and intuitive.

    Two things:

    1. - Since using the Mango betas, I've been getting this error intermittently when I attempt to voice dial:  "Sorry can't use Speech right now to make a call.  Please use the dialpad.", or something very similar.  Pre-Mango, I never had that problem.  Why did this start?

    2. - I live near FM-621 in San Marcos, TX, and when Bing Nav reads off directions, it refers to that as "Federated States of Micronesia" instead of "Farm-to-Market".  How would I tell someone that that needs to be changed besides here?

    Keep up the awesome work; Mango FTW!

  • ckruhs 1 Posts

    Great feature!

    Calendar/Task and Email integration would be awesome.

  • Can the voice features be used by 3rd party applications?  Would I be able to create an app or game that requires some sort of voice recognition, and leverage the existing technology already used in the phone?

  • @GoghUA, might the problem be with your Blu-tooth device?  I know some devices don't have all the necessary BT profiles.  For instance, the first BT earpiece I used didn't support music--but it worked quite well with the most basic commands (Call, Open, Find).  Now I use the Jawbone ERA and everything that WP7 can do is fully supported by my earpiece.  I'm looking forward to the expanded capabilities with Mango.  Just a thought.

  • Oh, @Razor, if only........Majel Barrett, RIP.

  • ldgregg 1 Posts

    Sweet!  I've been involved in speech apps for over 20 years and it is finally coming of age and RELIABILITY in WP7.  I've been impressed with my HD7's ability to understand and respond with accuracy to my voice Bing searches (even with background noise).

    Now, if I could only download Zune tunes using my ZunePass subscription, I think I'd just about have it all!

  • I hope that when my 3 yr old son keeps yelling, "Oodles of Noodles" in the typical 3 yr old style (gibberish-sounding), he doesn't keep contributing to ruining the algorithm!

  • Very cool to have this feature as a core part of Mango.  I have a free app called Vlingo on my Android (EVO 3D) and it does this and more.  It includes an "always listening" mode which is great.  So I just say "Hey Vlingo" and I can start telling it to do stuff like compose a text message or read email.  Although I usually have that mode disabled since I'm usually listening to music in my car.  I can see a lot of other people using it, though.  WP7 might actually start being usable to me someday if they keep adding new features like this.  Hopefully WP7's TTS/Speech features will be highly customizable.  Vlingo is very customizable which is fantastic.

  • samsabri 12 Posts

    I'm running the latest Mango build on my device. Sending a text with my voice works fine, but I have yet to get it to read it back to me when someone sends me a text. Anyone have any advice to get it working?

  • Hanu 2 Posts

    Yeah, I really like it. It improved a lot in Mango.  @acctman, to turn on speaker ex: try "call Bill Gates on speakerphone".

  • my regular non-smart phone could read text messages to me. I was not happy when I switched to WP7 and it could not do that or even send text messages via voice. Now I see it coming and I am hopign it will work the same way when Mango finally gets to us.

  • I wear hearing aids with a blutooth interface and I am connected to my phone at all times. I love this new feature and have used it constantly since I loaded the Mango Beta. People in my office are amazed I can communicate this way. After using this feature, I will never go back to reading or typing texts. In fact, I rarely take the phone out of the holster. I can text,  make calls and answer calls all by voice commands.  This is really cool!

  • weilian 1 Posts

    I'm not sure where to send feedback and bugs or where to get help on my issue so I'll post it here and hope someone may have an answer.

    I have speech enabled for when it is connected to bluetooth and I have bluetooth enabled to connect to my car.  It works flawlessly with incoming calls or making calls which i access through my vehicles interface; connects to the speakers and mic in the car however, it is not so with text messages.  When i receive a text message my music cuts out but the confirmation message is played through the phones speaker and not through the cars' speakers' as one would expect.  It also doesn't use the cars mic but from the handset instead.  So you can imagine how much of a pain it is to use.  can't hear the message and cant make voice commands as the phone is somewhere in my pocket while driving.  

    2011 Audi A4...  any ideas?

  • @Bragic, @ChrisLynch and others asked about music commands using voice. That didn't make it for Mango, but it's on the radar and being looked at for a future release.

  • @Johannespreekt asked about whether voice messaging will allow navigation apps to stay in front. I can't speak for all apps, but I know that incoming messages don’t interrupt the built in turn-by-turn directions or the music player. There is an overlay when the phone is in “listening mode” but that is dismissed when the voice session is over, and the phone returns to whatever it was doing.

  • @dandrayan asked what key phrases and commands are supported. To get a great summary of what you can do with voice in Mango, put the phone in listening mode by holding down the Start button, then tap the help button in the upper right of the overlay. There you can page through several short pages that list the things you can do with voice on Mango. You should also keep an eye on Windowsphone.com for new content we’re working on that outlines the speech features in Mango.

  • Thom 31 Posts

    This is a very useful feature when it works but I have a couple of issues

    1) The voice is awful. it's the same robotic female voice used throughout the phone but it can't read properly. This is especially apparent in the Bing Navigation where it will pronounce things like the UK town St. Alban's (Saint Albans) as Street Albans. It will also cut off sometimes before the full direction is given and just try getting it to say "take the second exit" at a roundabout, it comes out with "take tuh-huh exit"

    2) There is no way to trigger this text-to-speech function on demand, so if I miss the automatic voice prompt I've missed out. It would be great if you could put a "read aloud" button on the context or tap&hold menu. Alternatively, you should be able to choose whether the message will be read if it doesn't hear a response. Many times a text has come in while I have my basic wired headphones plugged in but because there is no mic it won't read it.

    3) No matter how hard I try it doesn't understand me when I try the Speech-to-Text function. I get nonsensical unreadable gibberish. This means it would actually be quicker to pull-over and type the thing out! I have a "normal" English accent.

    4) It only works on bluetooth headsets by default. I understand why this has been done but it hides the feature from people who are wearing wired music headphones. I think it's better to have it enabled for both and let people turn it off (@GoghUA go to Settings>Speech and check the option listed at the bottom)

    I'm sure it's very clever in the background but the actual human side of it is lacking

  • Cannot wait to play with this when Mango's released! :)

  • jacksg 4 Posts

    Sweet...!

    Hope I get this feature on my device soon..(not only for US right i presume...)

  • Nice! We have an application called PhoneSocial+ that does exactly the same thing with your tweets. Although you can't still reply with voice commands, you can listen to your tweets while you drive. Check out: weblogs.asp.net/.../introducing-phonesocial-for-windows-phone-by-uruit.aspx

  • Thanks for the reply, Bill.  I had already discovered the help text, I was hoping for more of those key phrases you had mentioned.  Punctuation and multiple sentences were the two biggest things I was interested in.  Otherwise, I think the team has done a great job with this feature!

  • Hi - does anyone know of an easy way to switch this on and off (i.e., easier than going to Setup, Speech, etc?).  I want it to work while driving but during meetings with other people in the room, I want to turn it off.

  • Arron25 1 Posts

    Another feature in the phone I found was 'Ease of Access..Speech for phone accessability' where the caller number / name is read out.. but it will only read out the main speaker..not through the bluetooth headet. Is there a fix / hack for this feature to enable it the redirect to the 'headset. Nokia tell me it is a Microsoft problem..

  • This is fine for speaking in the car, and is a nice little feature.  But it does not begin to address my most important need with a Windows Phone.  My frustration with my Windows phone is that it has such seamless Office support, but has a lousy little electronic keyboard that is not fit for anything other than texting.  I am in meetings constantly and need to take minutes.  I have a wonderful program to be able to do that in, but cannot pair it with a folding portable full-size keyboard like the Freedon Pro which I happen to like very much.  I would like to strangle Windows for changing the HID bluetooth capabilities for this devices.  I am tempted to turn in my phone and go back to pad and paper at these meetings.

  • So i love my phone and the voice commands. is there a way to only disable the instructions. they are getting a bit old. i know what i am supposed to do now.