Look who’s talking: Speech in Mango

On a recent run around town with my wife to grab dinner and pick up one of the kids, a text message came in from my son. Not an unusual event in itself, but what made this message interesting is that my phone read it aloud to me — and I replied back with my voice.

Meet Voice-to-text, a new hands-free messaging feature coming this fall in Mango and one that’s quickly become a personal favorite. And after seeing it in action on my test phone on our drive, my wife looked at me and said, “I want that for my car.”

Voice-to-text works for both text and instant messages, and it’s handy even when you’re not driving since it can slash the time you spend typing—a good thing at times even considering the fantastic keyboard on Windows Phone.

But the feature really shines when being hands-free is a necessity, like when I’m driving. My car has Bluetooth built in, and my Windows Phone is paired with it. When I’m driving and a message comes in, Windows Phone uses the Bluetooth connection and car’s sound system to narrate the message and record my response (pausing and resuming music or the radio if needed). The “conversation” goes something like this:

WP: [music pauses]You have a text message from Cody Pardi. You can say read it or ignore.
Me: Read it.
WP: “When will you be home?” You can say reply, call or I’m done.
Me: Reply.
WP: Say your message.
Me: “In about 20 minutes.”
WP: [The phone transcribes and repeats the message] You can say send, try again, or I’m done.
Me: Send. [music resumes]

My initial thought when I used it for the first time was “this is a game changer” because it felt natural to use while driving without being a distraction. And it all just worked. In fact, I was so impressed with the technology I decided to sit down with Alex Perez Avila, a program manager for many of the voice features in Windows Phone, to get an inside look at how it all happens.

Speech dialog box

Alex works in the Microsoft Tellme team, which develops the voice recognition and text-to-speech technology found in a growing number of Microsoft products including Office, Windows, and Xbox. He told me that competing smartphones are adding some voice features, mostly for existing phone options. Alex and his team, meanwhile, wanted to create something seamless that felt natural for completing everyday tasks such as calling someone in your contacts list or finding a local restaurant. “We think this will set Windows Phone apart,” he said.

Windows Phone taps the Microsoft Tellme cloud service for voice recognition and transcription. “No one else has it,” Alex said, “and we think customers are really going to like it.” The service, he notes, has built-in ways to learn from itself and improve recognition and transcription accuracy over time–all without putting additional software on the phone. The feature, he says, “will just get better and better as more people use it.”

I mentioned to Alex that I noticed my Mango phone can speak modern-day abbreviations such as TTYL (“talk to you later”), LOL (“laugh out loud”), and even Smile (“happy smiley face”). I asked him if Windows Phone could translate those back if I spoke them while composing a text message. “Yep. We understand a limited set of key phrases and will transcribe them as abbreviations.” He demonstrated—and indeed it worked as advertised.

In addition to Voice-to-text, Alex walked me through several other Speech-related improvements on the way. In Mango, for example, Speech can be triggered even when the phone is locked by pressing and holding the Start button. You also have control over how and when text messages are read. By default, the phone reads messages aloud when connected to Bluetooth headset or stereo (which is how Windows Phone knows to read my text messages in the car).

Check out Pocketnow’s preview of voice features on the way in Mango.

There are some great new accessibility-related Speech features coming in Mango—using voice to forward calls and setup a speed-dial list. When Alex showed me these, I was impressed. In one very cool example, he stored a number in a speed dial location and then dialed it, hands-free. Other things you can use Speech for in Windows Phone include:

  • Making a phone call by name or nickname
  • Redialing a number
  • Calling voicemail
  • Searching Bing
  • Turning on the speakerphone
  • Starting an app while in a call
  • Navigating Maps

All these features put together makes voice an incredibly integrated part of Windows Phone in Mango, and I think will it set the bar for voice-recognition technology in a smartphone. To finish the story I started this post with, I told my wife that if she wanted that voice feature in her car she’d have to get a Windows Phone because her smartphone doesn’t do that.

“OK, fine with me,” she said.

Now that was something really worth hearing.

Bill Pardi is a senior consumer writer in Windows Phone Engineering