One of the many new features coming in the next release of Windows Phone—a.k.a Mango—is Music search, a built-in song recognition feature jointly developed with researchers on the Bing team. We haven’t said much about it yet, so recently I sat down with several members of the Augmented Reality team (how cool is that name?) in Windows Phone Engineering to find out what it does and how it works.
We met in the second-floor office of Elliot Kirk, who was responsible for testing the new feature. Kirk’s office is stuffed with the tools of his trade: stereo equipment, a handheld decibel meter, and a gilded, football-sized cockroach impaled with a nail (his job is killing software “bugs” after all). Joining us were program manager Steve Cosman and Houston Wong, lead programmer on the feature. If you like what they had to say, make sure to check out the new Channel 9 interview with Steve about Music search.
Q: So, guys, first tell me: What can I do with Music search?
Elliot: The basic scenario is that you’re listening to some music that you’ve never heard—or you hear a song you like but don’t remember the name of it. In Mango, you can just pull out your phone and within seconds get the name of the song or artist and also a link to the Zune music store so you can download or buy it.
Steve: Anything you can buy in Zune Marketplace you can find with Music search.
Q: Some apps in Marketplace can already do this—the identifying part, at least. How is Music search different?
Steve : Most other apps listen to a song for a fixed amount of time, and then analyze and try to match it. One of the things we do differently is we’re continuously listening and analyzing. As soon as we know what the song is, we return the result to you.
Houston: What this means is that you might actually get near instant results in the extreme case.
Q: That’s cool. How does Music search work?
Steve: We’re using the microphone to record and then doing something called ”fingerprinting,” where we look for unique acoustic features of the music. We listen for about 3 seconds, create a fingerprint, and then we send that fingerprint to Bing, which looks for a match in the Zune music catalog.
Elliot: If it doesn’t find one, we send another 3-second slice until we get a result.
Members of the Augmented Reality team in Windows Phone: (from left) Houston Wong, Steve Cosman, and Elliot Kirk.
Q: You don’t transmit the actual audio?
Steve: No, we don’t—which means we’re using less of your data plan. And it’s generally quicker to use fingerprints since we’re matching against a very large data set of music: millions and millions of tracks.
Q: So someone has already scanned all the tunes in the Zune catalog and created a library of digital fingerprints for each song?
Steve: Yep, exactly. At that point it’s pretty much just a straight up search. We look at the fingerprint we’ve created on the phone, compare it to the millions of fingerprints generated from tracks in the Zune music catalog, and see what matches.
Q: How was Bing involved?
Elliot: This started as a Bing research project. They developed the fingerprinting algorithm. The Bing team has been amazing. It’s been a great experience working with them.
Q: Are some kinds of songs more challenging to match than others—say, all those covers of “Louie Louie” or samples of original songs embedded in other songs, like you find in hip hop?
Steve: It’s pretty interesting how we pick the right track. We’re working on that still. One problem is when you get, for example, the German karaoke version of Britney Spears’ “Toxic” instead of the multi-platinum U.S. album version. There’s also the situation where the identical hit song is on 25 different albums. How do you figure out which one to return? That’s another problem we deal with.
Elliot: But the fingerprint is actually getting good enough that we can identify the album version of a song from the live version—as long as that album is in the database.
Steve: No one plays a song the exact same way twice. Even your ear can’t detect the differences we can.
Coming in Mango: Tap the music note icon in Bing to start a new Music search.
Q: Are there other situations Music search finds challenging?
Steve: It’s fine if there are voices talking over the music. But if you sing along with a song, you might screw up the detection process.
Elliot: When you sing over a song you actually alter the timing of it, so the fingerprint we create doesn’t quite match the original.
Q: Sounds like this feature must have really been a challenge to test.
Steve [gesturing to Elliot]: His test stories are the best. There was this point where Elliot for about a week was running around to everybody’s office to find who had the quietest office for our tests.
Elliot: We wanted to know what the lowest and highest sound levels we could detect were. I also came in late on a Saturday evening hoping no one else was around to find out how loud we could go. It turns out to be around 120 decibels, which is almost as loud as a jet engine. I had to upgrade from PC speakers to a full 110-watt receiver with surround sound. I had my earplugs in and my fingers over my ears, and I just cranked it. That was fun. You could hear it all the way across the building.
Q: What song did you use?
Elliot: I think it was Britney Spears’ “Toxic”. It was painful.
Steve: I wouldn’t have admitted that if I were you.
Q: Anything else?
Elliot: We went out and tested the last 10 years’ worth of tracks from Billboard, to make sure all the most popular tracks are detectable.
There were also a lot of constraints that we had to either model or go out and test in the real world—like background noise in the places where we think most people will use the feature. I spent a lot of time driving to and from work at different speeds with my windows open different amounts—just to make sure we can recognize the songs if the window was all the way open and we were doing 40 m.p.h and had the music blaring.
Steve: You should never use your phone while driving.
Elliot: I also spent a lot of time in bars— but not necessarily just for testing.
It is a great feature, however I was a bit disappointed because I was trying to find information about a musical piece and composer. There is no way to actually turn the music recognition into a search query about the piece. Please add this feature to this great tool.
Notice that he says "but I wouldn't draw the conclusion that you'll never have Mango."
...but he can't be positive about it.
rev1.4 is a hardware issue and will not be fixed.
The more I read this blog, the more excited I'm getting about Windows Phone and the possibility of purchasing one in the future. I'm currently on the iPhone 4 and while I love it, Windows Phone just seems to 'speak' to me.
Any news yet on new hardware coming to AT&T this year? Possibly another HTC Win7?
@Michael, thanks for the link to Jason's site. It now happily resides on my home screen.
(Just don't tell Steve Jobs that I have Window Phone Blogs on my iPhone. It just may break his heart!)
Great interview! I really hope this works better than shazam. Shazam is decent for new releases but any older songs or genres other than pop and it struggles. As a salesmen I'm really excited to see this in action and be able to show it to my customers.
Yes, like Michael said it is not a device specific issue but an operator one (T-Mobile).
Thanks! And don't worry, i'm not a hater :D
@gadgetebz, comeradealexi: Eric just posted on Orange and Omnia. I'm reluctant to make predictions--because if I'm wrong you guys will probably hate on me---but the scent of progress is in the air. Still, no *official* news yet..
Any news on the Omnia 7 Orange UK situation would be greatly appreciated!
Great feature. Good job, Microsoft! I find Mango more and more innovative by the day. Now, how long before Apple copies? ;)
@Juan,My HD7 has french(France),french(Canada),French(swiss),German,Italiano and Spanish,and my phone is up to date,BTW i'm in Canada
This does sound great. I'm a big fan of Shazam (sorry), but if this is quicker and more accurate then no reason why I wouldn't jump ship. If it uses Zune's database though, how, if at all, can it identify pre-release tracks? Shazam is surprisingly good at identifying club remixes or eaerly play tracks from the radio... Looking forward to trying it out!
This stuff is all really exciting, but I'm still pretty keen just to get Copy and Paste... It's good to hear that there are many people working on solutions to the Samsung issues, but it is not unreasonable of us to ask why Orange Omnia 7 are not updated when other Omnia 7s are? Any chance of an answer? Are legal issues keeping you all silent on this front? Orange users had to pay £50 more to select the Omnia over the HTC!
@MichaelStroh - hey I am in the uk on orange with an OMNIA 7 still waiting for the NODO (NOSHOW) update :-)
This looks like a great feature but will it be US only like so much of the good stuff??? I was truly embarrassed trying to demo VOICE SEARCH in BING to some friends just like Joe B did in his keynote only to fail big style................Later I find out VOICE SEARCH in BING was a US only feature!!
Whats the deal with the new stuff ?? A nice clear table would really manage peoples expectations I think.
Looks great! I can't wait for Mango!
@MichaelStroh: Thanks for the shout-out. I've got some really great feedback on that post.
@jburch: Apologies. No news yet...but I wouldn't draw the conclusion that you'll never have Mango. This is unfortunately just taking a little time to sort out. But there are a bunch of people at multiple companies working on the solution.
@Juan: Sorry, I don't know the specifics off the top of my head. But, as I said in a comment a few weeks ago, we do give phone makers and operators the ability to customize a few settings and add custom apps, for example. This might be a case of that.
Any news on finding out why the HTC HD7 only has English as an option under Settings -> Keyboard -> Keyboard Languages?. And this is after installing 7392.
Wow.....skipped right over me Michael.
@Freypal: Thanks for the note! These folks here have a lot of fun working on the phone. Glad it came across in this post.
@myan: Thanks for providing the independent test results. :-)
@gfunk84: Will ask.
@Zartan: Cool idea!
@Jason: Thanks for the insightful comment. And kudos on windowsphonemetro.com. Has everyone seen Jason's site? If not, run over and check it out RIGHT NOW. I was impressed by the quality of news, tips, and writing--and I even learned two things from your "!0 Things You May Not Know About Your Windows Phone" post. (And I wrote a book on the phone!). Nice work, dude.
It's funny telling people about features like this either currently on or coming to Windows Phone because they usually say, "Well, I have an app that already does that." What they don't realize is that this stuff is becoming part of the OS, not just an app, which is a big differentiator for Windows Phone. I'd like to see more of this displayed in the ads on TV. People should know how great Windows Phone works out of the box, they don't need to download a ton of apps to get the functionality that other platforms offer. It will be tough to walk the fine line between not alienating devs and doing this, but it's a key piece of the experience.
This sounds great except for the fact that I have a Samsung Rev 1.4 and can't download any updates which means I will probably never have Mango.
Any word on the issues for the Samsung Rev 1.4?
Very cool to hear that they plan to use Bing Audio Search to find more than just music in the future. I'm hoping that Bing Vision will incorporate more off-the-shelf products over time and maybe even things like clothing and cars. Imagine if people could aim the camera at you and know everything that you're wearing, where to buy it, and how much it costs.
Will this feature be available in Canada (or anywhere else where Zune music has no presence)?
WOW Bing Music Search works great!!! Awesome, kudos MS!
I tried several songs (even pretty strange ones) with the emulator and all of them have been identified.... It seems to work even better than expected!
Interesting article. It's great to read about the fun side to development.