Skip to main content
June 17, 2013

Say what! Speech recognition gets faster, more accurate on Windows Phone


Your Windows Phone is becoming a better listener.

In a blog post today, the Bing team announced that voice search and voice-to-text—two popular Bing-powered phone features—are now up to twice as fast and 15 percent more accurate, a feat accomplished by exploiting some recent biology-inspired artificial intelligence breakthroughs from Microsoft Research scientists.

Check out the video below to see Bing’s Stefan Weitz and MSR’s Michael Tjalve demo some of these improvements on Windows Phone 8—or just try them yourself (U.S. only for now). Tap the Search button, then the little microphone icon, and then tell Bing to find something. I said, “Show me movies playing in Seattle” and my search results popped right up. You can also try dictating a text message or email (again, just look for the little mic icon in each of those apps). If you’ve never played around with the phone’s speech recognition features, this how-to article is a great place to start.

Teaching a computer to understand the human voice in both real time and noisy real-life environments is no easy feat. And it turns out there’s a fascinating backstory to how Bing engineers and their Microsoft Research collaborators pulled it off. The Inside Microsoft Research blog dives deeper into the science behind today’s news.

As you’ll see, it revolves around something called deep neural networks. You pretty much need a PhD to understand the details of this stuff, but simply put what’s cool about the research is that it draws on biology and the human brain’s natural pattern recognition ability for inspiration. In practice, deep neural networks involve lots of mind-twisting math, racks of fast computers, and a mountain of data to learn from.

As the post notes, deep neural networks show promise for other phone-related applications, too. One is real-time language translation. Imagine popping open the Bing Translator app on your phone, speaking in English, and hearing your voice simultaneously translated into Mandarin—and the Chinese-speaking voice even sounds just like you.

Don’t expect your phone to do that any time soon. But as MSR’s Rick Rashid demonstrated at a recent conference (a story described in today’s post), it’s also not necessarily science fiction and just one example of what deep neural network research might someday make possible.