Now SmartScreen automatically identifies more than one billion newsletters every day

Now SmartScreen automatically identifies more than one billion newsletters every day

  • Comments 21
  • Likes

SmartScreen®—it’s not just for spam anymore. The latest release of Hotmail uses Microsoft SmartScreen to automatically identify more than a billion newsletters every day. Since newsletters account for more than a quarter of all the mail in a typical inbox, having them automatically categorized is a big time-saver.

This post will walk us through how we took SmartScreen and trained it to identify not just spam, but also specific kinds of graymail—newsletters—to help customers stop spam and manage graymail.

 

 

Graph showing Hotmail Inbox 2006

 

When inbox spam was at 30%, our job was really clear—our enemy, clever as he remains, was impossible to miss. We made huge investments in SmartScreen and reduced spam to historic lows of less than 3%.

With spam at manageable levels, we began looking at the rest of the inbox, and what we found was pretty surprising.

Graph showing Hotmail Inbox 2012

We could easily tell which messages were person-to-person, and we identified spam getting past our filters. The majority of what was left was something we refer to as graymail, and when thinking about how to deal with graymail, it became clear that the fundamental problem wasn’t just which things to accept or reject. Unlike spam, which everyone wants to be rid of, there is no general agreement on how to deal with graymail.

We believe the solution lies in delivering features that enable you to manage your graymail. With that in mind we introduced powerful new tools, including Sweep, Scheduled Cleanup, special views of the inbox, and other enhancements to put you in charge.

However, as cool as these tools are, they require maintenance to stay current and rely on you to identify the messages to be managed. We know you’ve got a busy life, so we wanted to do more.

Automatically classifying graymail

The basic idea is to identify what a message is before you see it, and to take special actions on the message where it makes sense to do so. At its core, this is not a new concept. SmartScreen already classifies and flags messages as spam and/or malicious and tells the message delivery system how to handle the message.

For example, based on the threat posed by a given message, SmartScreen may decide to:

  • Deliver a message from someone you don’t know to the inbox, but allow you to decide whether you want to view the entire message[1].
  • Mark a message as spam and deliver it to the Junk folder.
  • Reject a message that contains dangerous code or is from a known bad sender.

We learned a lot in the fight against spam, and since the infrastructure was already in place, it made a lot of sense to apply those lessons and our new tools to graymail management. By automatically categorizing graymail, we can make Sweep, Scheduled Cleanup, and all the other cool new tools even better. The big question was where to start.

Graph showing graymail breakdown

When we looked at the graymail portion of the inbox (a whopping 82%!), a few things immediately jumped out. Social networking has really become a big part of everyone’s lives in the last couple of years, and the email notifications associated with Facebook, Twitter, and other popular sites have become a large part of people’s inboxes as well. Fortunately, the most prevalent senders in that category are well-known, don’t change that often, and are easy to detect, so we shipped the Social Updates view in the last release of Hotmail.

However, we knew there was a bigger prize, a segment of email so pervasive and chatty that it completely dwarfed social updates—to the tune of 50% of some folks’ inboxes!

Every day the average person’s inbox is flooded with messages from thousands of different retailers, clubs, societies, and schools, or with coupons, deals, and notifications from deal aggregators talking about all the exciting things that people need to be buying, doing, or seeing. We refer to this subset of graymail as “newsletters.”

Newsletters are unlike notifications from Facebook or Twitter, which always come from the same email address, always look the same way, and mostly contain the same content. Newsletters are different. Newsletters can be extremely diverse. Anyone can send newsletters, and newsletters can include any format or content they like.

Dealing with that diversity meant we needed to take a different approach than the approach we took for social updates. And, because that diversity is a trait shared by other categories of graymail, we wanted to build something that could grow beyond newsletters.

Building the newsletter filter

To get Hotmail to identify newsletters for us, we began by making a list of newsletter characteristics and built a piece of software to extract them from incoming emails. This list forms the model of what makes newsletters different from all other mail and includes three aspects: presence of the List-Unsubscribe header, the sending email address, and what gets shown to the user.

With a clear definition of what we considered a newsletter, we created a reference set of about 10,000 messages that we classified as “newsletter” or “not a newsletter.” Think of the reference set as a test for our newsletter filter: the rate at which it correctly identifies newsletters defines its accuracy.

Using a technique called machine learning, we built a system that trained and adjusted the model until it reliably detected most of the newsletters in the reference set. Because the reference set was built from a completely random sample, we knew that the filter’s performance against it would very closely approximate “real world” performance. Once we were detecting most of the reference set’s newsletters, we began an internal pilot of the feature in September 2011—we call this “dogfooding.”

Eating dogfood

“Dogfooding,” the process of using our own employees to test new software using our real email accounts, was crucial to identifying and fixing problems with the filter. We provided the dogfood users with a way to report missed and incorrectly identified newsletters just as we do for the occasional spam message that gets through our filters. We spent several weeks analyzing the failures and adjusting the model until we’d worked out the known kinks.

For example, a major problem we identified early on was that financial services businesses tend to send all their mail from the same domain, and often have a lot of boilerplate language that closely resembles newsletters—even though they may not be. Rather than take the risk of filing away your bank statements, we decided it was better to leave these messages alone and trained the newsletter filter to ignore them.

How well does this work?

In general, spammers are pretty indiscriminate and don’t think too hard about whether to send you a ton of offers for Rolex watches, cheap loans, or pharmaceuticals. With minor differences, everyone gets pretty much the same spam. The interesting thing about graymail is that you accumulate it over time, based almost exclusively on what you do online, and so every inbox is different.

We designed the newsletter filter to perform well for the average person’s inbox: correctly identify most of the newsletters most of the time. But this doesn’t mean we didn’t aim high. Let’s look at the data. Most newsletters are sent out on weekdays; about 1.5B newsletters are sent per day; newsletters make up about half of all email delivered to our servers. This represents 73% of the newsletters in an average person’s inbox (36% of all their email), and when we think a message is a newsletter, we’re right 97% of the time.

Graph showing newsletter detection

Getting this right allows you to filter or sweep these messages quickly, which means you can spend more time reading and responding to email than reorganizing it.

Using Hotmail’s categorization tool, you can change the categorization of a message—for example, marking or unmarking it as a newsletter. This generates feedback that the newsletter filter learns from, so it’s able to overcome previous mistakes as well as stay on top of new newsletters. This means the rules set up to deal with newsletters will not just apply to old ones, but also to new newsletters created after you’ve refined the rules to deal with newsletters. The best part is that SmartScreen learns from what customers do with their newsletters, and everyone benefits as the filter gets smarter!

What’s next?

With the newsletter filter now in the hands of all our customers, we will continue adding new categories and features that enable you to get the most out of them. We’re investigating ways to more effectively present and manage email-based receipts, bank statements, and more. We hope the newsletter filter can be a helpful tool in your own war on graymail. We love getting your feedback, so let us know how it’s working for you, and, as always, Thanks for using Hotmail.

Dick Craddock, Group Program Manager Hotmail


[1] Note: We will be changing this for the better in an upcoming release. Hotmail will soon use domain reputation to decide which messages to “light up” by default, lessening the burden on customers.


21 Comments
You must be logged in to comment. Sign in or Join Now
  • It might be off-topic, but I have a few ideas for Windows Live:

    *the ability to add favorite TV Shows, Movies, Comics, Books, Authors, Music, Artists, Albums, and Songs to our Live account.  With this, we can keep track of our favorite shows, movies, and music, and get notified when new episodes or songs are released so we can go purchase them from Zune, and when nearby concerts are posted by our favorite artists.  This would also be linked with Zune and our Zune Passes so we can listen to music from there and earn badges from Zune.  Comics would link to DC, Marvel, etc., as would books so we can get notices when new comic issues are coming out, and when new books by favorite authors are coming out.

    *Add a "Places" category so that we can use those on our Live Calendar when creating an event so we can get directions to whatever event it is.

    *show my contacts' houses on my map

    *allow sharing of favorites (movies, TV Shows, books, comics, authors, artists, music, albums, songs, restaurants, tourist attractions) so that we can comment on friends' favorites, see what they've recently watched or listened to, and send them movies and shows, and have listening / watching parties with them through Windows Live Messenger or Zune so we can all watch stuff together even if we're in different cities.  If we share a place we've been, such as the Capitol building, link it to pictures we've tagged that are in SkyDrive, so our friends can see and comment on those, and if a friend searches for, let's say, Savannah GA restaurants, they can see that I've been there, the places I went to in Savannah that I shared, and any pictures I took there.  This is partially possible in Zune with the Zune Social, but that should be expanded greatly so we can share whatever else we like too.

  • @limitless - please send me a private message with your account information and the subject line of the newsletter you're referring to. thanks!

  • Hi everyone, a couple of comments on newsletters that I think will become FAQs.

    ::POP vs. Forwarding

    When Hotmail is configured to retrieve messages via POP, they bypass the newsletter detector because they take a different path to your mailbox than those sent directly to your Hotmail account. If (like me) you have many personal email accounts, and want to take full advantage of the newsletter detector, here's what you do:

    1. In the Hotmail Options page, add your other accounts to the list of accounts Hotmail can send mail from. This tells Hotmail that you own those accounts and helps smooth delivery. Disable POP retrieval from those accounts if you've set that up.

    2. Go to your other accounts and configure them to forward inbound email to your Hotmail account. This will cause your non-Hotmail accounts to relay mail to Hotmail, and most of the time this means relayed mails will appear as if they were sent directly to your Hotmail account. In a limited number of cases the forwarded emails will not be detected as newsletters, and this is usually due to the forwarding account physically changing the messages, which brings me to my next point.

    ::How To Report Miscategorized Newsletters

    All you have to do is change the category. In the case where you categorized something as a newsletter, we should also be creating a rule (you can see these from the Options page "Rules for sorting new messages") that auto-categorizes future messages from that sender.

    If you feel like you've found a bug or that something's not working right, you can post the issue on the blog or send me a direct message. Keep in mind I can't provide ongoing technical support (one of me, many of you) but your feedback is always welcome.

  • @Paul Midgen,

    I have the same problem: just a few mails are detected as Newsletters. Also after categorizing a not detected Newsletter as such, Hotmail again doesn't detect the Newsletter on the next day.

    Could this happen because most mails are in German?

    (mails are automatically forwarded from another account)

  • @jobinoy - looking at both your comments, if the messages you're referring to are all retrieved over POP then that's the reason. POP messages (for the time being) take a different path to the inbox than messages delivered via SMTP and for that reason the newsletter classifier doesn't run on them. if you want me to look at some of the messages in your inbox, send me a private message with your Windows Live ID.

  • Jobinoy
    9 Posts

    Could it be that this doesn't work for emails that I receive from another account over POP

  • Jobinoy
    9 Posts

    Hey Hotmail Team

    I love the changes you are making. Unfortunately my hotmail did not classify one single email as a Newsletter since the new features are turned on on my profile, even though I mark every Newsletter by hand when it comes in. It looks like there is a problem with my account,or am I doing something wrong??

  • In addition to langware's suggestion, something along the lines of Gmail's two-step verification might also be helpful (using one's cell phone to receive a code to complete login; in the case of those accessing their hotmail from one home computer, the code would be good for a month).  I prefer a few ounces of prevention to a pound of cure (esp. considering the nightmare some users go through to try to reclaim hijacked accounts !).

  • langware
    154 Posts

    @Paul Midgen,

    Thank you for your comprehensive response.

    Hopefully, progress will quickly be made on achieving your long-term goal of proactively identifying and rejecting reply-to phishing messages before they are delivered.

    One more question that is not too far off topic:

    In order to address the problem of account hijackings, is an option to define ones default IP address being considered? If a customer selects this option, then whenever an attempt is made to access their account from an IP address other than the default, additional security would be applied (i.e., additional security questions, etc). Many financial institutions use this technique. Yes, many customers use DHCP and their IP address changes over time, but those customers who choose to use this option could re-set their defined IP address when it changes (or resetting could be done automatically for them).

    Just a thought ... no single technique is 100% effective, but by providing additional security options you can reduce the probability of customers' accounts being hijacked.

  • @langware

    Thanks for bringing this up, it's a great question.

    We call this type of solicitation "reply-to phish", because the recipient is asked to provide information like credentials, social security numbers, credit card info, etc. to verify something or other and/or avoid some kind of penalty like account closure, additional charges, and so on.

    Despite services like Hotmail, Gmail, PayPal, and so on repeatedly saying that we'll never send emails asking for user passwords, as you point out the seemingly legitimate appearance of these messages tricks enough people that the bad guys keep doing it. We're keenly aware of the problem space and very motivated to provide an effective and durable solution, especially in the face of the internet's account hijacking pandemic.

    In the cat & mouse game with phishers, we have made changes to our filtering systems that trap a lot of this mail, though some still makes it to the inbox. The root of the problem is that while from a user's perspective the messages may all look the same, phishers have learned how to more effectively camouflage themselves. It has gotten to the point where we think that really nailing this problem will require several steps over a couple of releases.

    Initially, we're helping users better protect themselves by detecting messages that may be asking for inappropriate information. Those messages will be flagged with a gentle reminder not to give such information out in email, and this functionality will be turned on within the next couple of weeks.

    Longer-term, our goal is to identify and reject reply-to phish before even a single one is delivered, but we can't start doing that until we're certain that legitimate password reset notifications, account maintenance reminders and so on pass through unhindered. To do that we're planning new filtering technology that's not susceptible to the various camouflage techniques employed by the bad guys.

    In the mean time, we'll maintain our practice of identifying and stopping campaigns as they emerge. This has the effect of stopping campaigns only after they begin, but is very effective in limiting their impact.

    Hotmail users can help here by using the "report as phish" menu item. We use those reports, much like spam reports, to help identify and stop similar messages.

  • I agree with sumitmeht. The ability to wok in multiple emails at the same time is awesom, special if you're able to keep the original message open in a tab while replying to another. Just check gmx.com (my previous mail service) to see how it might work.

    One thing I really think would be great are moreules to scheduling. Now I can only act on emails by sender, it would be great if I could user other rules, like destination, or another combination of factors like rule in the outlook client.

  • drakey
    1 Posts

    While these are all useful features, does the Hotmail team plan to introduce IP address masking to the platform? The fact that a recipient can just go into the email headers to determine the sender's IP address is a security issue, correct?  On this regard, I think that Gmail has it correct.

    Aside from that glaring weakness I love the improvements!  Keep them coming.

  • controlz
    145 Posts

    @sumitmehta - I also think there should be more themes (Gmail has way better themes at the moment). A theme that was based on the Bing image of the day would be great, and custom themes would be also great. Some themes are ok, but their colours for links etc. are terrible. It would be great to be able to change that.

  • Great efforts.. In right directions.. HotMaiL team truly deserves lots of appreciation and congratulations for working effortlessly to regain leading position in the market. This is really very difficult but you proved that it is not impossible. Your efforts really forced us to consider switching back  to Hotmail from Gmail which is also, no doubt, a great service.  

    There are few things on appearance front which seem to be not addressed - Themes and colors. In my opinion, only default blue theme is good. Others are not appealing. I have used the same blue theme for more than 2 years and become bored. Please introduce some good themes to the webmail interface to make it more attractive.

    Another feature which I think useful is tabbed browsing. It helps to work on multiple mail at the same time. I hope this must be in Hotmail teams' consideration. But overall I am quite satisfied as far as effectiveness of Hotmail for handling of mails are concerned.

  • Dino
    1 Posts

    Thank you, thank you, thank you for this! While I hate spam, I also hate having to constantly look in my spam folder for things that were incorrectly marked as such. Hopefully this will resolve it at last.

  • Looks good.

    I love categories, I wish they would sync with Outlook when I am home using that instead of live.com.

  • JohnCz
    204 Posts

    For years I've setup rules in Hotmail to move messages to a dedicated Newsletters message folder.  I do the same for billing statements, etc.  So its sounds like your filtering will make it easier for folks who don't know how or bother with setting up mail rules...nice job.  This reminds me that I probably should make the switch to the category capabilities added to Hotmail in Oct/Nov.

  • Bruno H
    6 Posts

    This is great. I really like hotmail and have been using it for more than 10 years now. But then something happened the other day that really made me mad and got me starting to think of maybe leaving Hotmail.

    What happened was that my wifes hotmail account was hijacked. Her hotmail account sent out spam to all her friends. The spam were sent at a time when her computer were of and she was sleeping! So I guess that somehow somebody got to her hotmail account. She uses a good complex password so I have no idea how this could happen. To be sure I wanted to check who had accesed her account the day it happened.

    Thats when I found out that Microsoft does not provide this service. I had no way of checking her account and get facts if somebody else had accessed her account and from which IP adress! The funny thing was that when I searched the net for the words "mail activity log" I quickly found out that Gmail had this service.

    This stinks! Why cant I have a log of the last 10 times that my account was accessed and get at least the IP adress of the one accessing it?

    So we changed my wifes password and hope this never happens again. But if it does we will have to change my wifes mail account. And that day I will be tempted to move my whole family over to Gmail.

    Please take this into consideration for coming updates.

    Bruno Horvat

  • jamiet
    45 Posts

    Hello Dick,

    All the improvements are great and all however they have limited value for me because I triage my email on my phone. That just happens to be a Windows Phone so if you could have a word with your Windows Phone colleagues and get them to introduce the same features to their email client I would be very grateful. Specifically, I would like my Hotmail categories to be synced to my phone and also flagged messages to appear at the top of my inbox.

    Thanks

    JT

  • langware
    154 Posts

    @Dick Craddock,

    Thank you for addressing the issue of graymail. Now, how about addressing the long standing problem of phishing messages that claim to be from the Hotmail Product Team and demand that the recipient disclose their account's credentials (or face account suspension).

    This is not a new problem .... it has existed for years ... and yet these messages (all of which are very similar in content, and thus should be caught by a filter) continue to be delivered to your customer's mailboxes. Unfortunately, some customers believe these phishing messages to be real and respond to them.

    Here is but one example of a long thread (May 2009 to December 2011) in the Windows Live Solution Center with customer complaints about this issue:

    www.windowslivehelp.com/thread.aspx

    Given your success at dealing with graymail, why are phishing messages (like the ones mentioned in the above thread) still being delivered to customer's mailboxes ... why can't filters be developed to quarantine these messages before your customer ever sees them .... or (if these phishing messages must be delivered) why not insert a banner into the message warning the recipient that Microsoft will never ask for account credentials within an email?

  • controlz
    145 Posts

    Thank YOU for providing such a great email service - competitors don't come close! I think categories and filters have a lot of potential, and I look forward to the next release of Hotmail (which will hopefully include some much-needed Calendar updates e.g. search, speed improvements, cleaner interface, right click on events etc.).