A peek behind the scenes at Hotmail

A peek behind the scenes at Hotmail

  • Comments 24
  • Likes

Hi, my name is Arthur de Haan and I am responsible for Test and System Engineering in Windows Live. To kick things off, I’d like to give you a look behind the scenes at Hotmail, and tell you more about what it takes to build, deploy and run the Windows Live Hotmail service on such a massive global scale.

Hosting your mail and data (and our own data!) on our servers is a big responsibility and we take quality, performance, and reliability very seriously. We make significant investments in engineering and infrastructure to help keep Hotmail up and running 24 hours a day, day in and day out, year after year. You will rarely hear about these efforts – you will only read about them on the rare occasion that something goes wrong and our service has run into an issue,.

Hotmail is a gigantic service in all dimensions. Here are some of the highlights:

  • We are a worldwide service, delivering localized versions of Hotmail to 59 regional markets, in 36 languages.
  • We host well over 1.3 billion inboxes (some users have multiple inboxes)
  • Over 350 million people are actively using Hotmail on a monthly basis (source: comScore, August 2009).
  • We handle over 3 billion messages a day and filter out over 1 billion spam messages - mail that you never see in your inbox.
  • We are growing storage at over 2 petabytes a month (a petabyte is ~1 million gigabytes or ~1000 terabytes).
  • We currently have over 155 petabytes of storage deployed (70% of storage is taken up with attachments, typically photos).
  • We’re the largest SQL Server 2008 deployment in the world (we monitor and manage many thousands of SQL servers).

You can imagine that the Hotmail user interface you see in the browser is only the tip of the iceberg – a lot of innovations happen beneath the surface. In this post I will give a high level overview of how the system is architected. We will do deeper dives into some specific features in later posts.

Architecture

Hotmail and our other Windows Live services are hosted in multiple datacenters around the world. Our Hotmail service is organized in logical “scale units,” or clusters. Furthermore, Hotmail has infrastructure that is shared between the clusters in each datacenter:

  • Servers to handle incoming and outgoing mail.
  • Spam filters (we will talk more about spam in a future blog post).
  • Data storage and aggregation from our service health monitoring systems.
  • Monitoring and incident response infrastructure.
  • Infrastructure to manage automated code deployment and configuration updates.

A cluster hosts millions of users (how many depends on the age of the hardware) and is a self-contained set of servers including:

  • Frontend servers – Servers that that check for viruses and host the code that talks to your browser or mail client, using protocols such as POP3 and DeltaSync.
  • Backend servers – SQL and file storage servers, spam filters, storage of monitoring- and spam data, directory agents and servers handling inbound and outbound mail.
  • Load balancers – Hardware and software used to distribute the load more evenly for faster performance.

Preventing outages and data loss is our top priority and we take utmost care to keep them from happening. We’ve designed our service to handle failure –our assumption is that anything that can fail will do so eventually. We do have hardware failures—with hundreds of thousands of hard drives in use, some are bound to fail. Fortunately, because of the architecture and failure management processes we have in place, customers rarely experience any impact from these failures.

Here are a few of the ways we keep failures contained:

  • Redundancy – We use a combination of SQL server storage arrays to host our data. We use active/passive failover technologies. This is a fancy way of saying that we have multiple servers and copies of your data that are constantly synchronized. If one server has a failure, another one is ready to take over in seconds. All in all we keep four copies of your data on multiple drives and servers to minimize the chance of data loss due to a hardware failure.
  • Another benefit of this architecture is that we can perform planned maintenance (such as deploying code updates or security patches) without downtime for you. Key pieces of our network gear are also duplicated to minimize the chance of network-related outages.
  • Monitoring – We have an elaborate system for monitoring hardware and software. Thousands of servers monitor service health, transactions (for example, sending an e-mail) and system performance for customers all over the world. Because we’re so large, we’re tracking performance and uptime metrics in aggregate as well as at the cluster level, and by geography. We do want to make sure that your individual experiences are reflected back to us, and not getting lost when we look at averages for the entire system. We care about every single user’s experience. We’ll talk more about performance and monitoring in a future post.
  • Response center – We have a round-the-clock response center team that watches over our global monitoring systems and takes action immediately when there is problem. We have an escalation process that can engage our engineering staff within a few minutes when needed.

Engineering process

I’ve talked a little bit about our architecture and steps we are taking to ensure uninterrupted service. No service is static however; in addition to growth due to usage, we do push out updates on a regular basis. So our engineering processes are just as important as our architecture to provide you with a great service. From patches to minor updates to major releases, we take a lot of precautions during our development and rollout process.

Testing and deployment – For every developer on our staff we have a test engineer who works hand in hand with him or her to give input on the design and specs, set up a test infrastructure, write and automate test cases for new features, and measure quality. When we talk about quality, we mean it in the broadest definition of the word: not just stability and reliability, but also ease of use, performance, security, accessibility (for customers with disabilities), privacy, scalability, and functionality in all browsers and clients that we support, worldwide. Given our scale, this is not an easy feat.

And because we’re a free service funded largely by advertising, we need to be highly efficient on an operational basis. So deployment, configuration, and maintenance of our systems are highly automated. Automation also reduces the risk of human error.

Code deployment and change management – We have thousands of servers in our test lab where we deploy and test code well before it goes live to our customers. In the datacenter we have some clusters reserved for testing “dogfood” and beta versions in the final stages of a project. We test every change in our labs, be it a code update, hardware change or security patch, before deploying it to customers.

After all the engineering teams have signed off on a release (including Test and System Engineering) we start gradually upgrading the clusters in the datacenter to push the changes out to customers worldwide. Typically we do this over a period of a few months – not only because it takes time to perform the upgrades without affecting customers with downtime, but it also allows us to watch and make sure there is no loss of quality and performance.

We can also turn individual features on or off. Sometimes we deploy updates but postpone or delay turning them on. In rare cases we have temporarily turned features off, say for security or performance reasons.

Conclusion

This should begin to give you a sense of the size and scope of the engineering that goes into delivering and maintaining the Hotmail service. We are committed to engineering excellence and continuous improvements of our services for you. We continue to learn as the service grows, and we take all your feedback seriously, so do leave me a comment with your thoughts and questions. I am passionate about our services and so are all the members of the Windows Live team – we may be engineers but we use the services ourselves, along with hundreds of millions of our customers.

Arthur de Haan
Director, Windows Live Test and System Engineering

24 Comments
You must be logged in to comment. Sign in or Join Now
  • It's really interesting to hear about not only how a large scale system is built but also with the emphasis on the test side. Having a 1:1 dev:test ratio is rare in the industry. The test team must have a lot to do!

    I'd be interested in knowing what the test automation approach is for a lab that has thousands of test machines. Is this off-the-shelf test automation or home grown? Are there standard SOA test tools in MS or is this mostly an innovation from the Hotmail team, as presumably the largest online service team in the company?

  • @simbosan:

    If you can let me know the ISP that your partner is using, we can attempt to contact them and help them overcome the current issues. (I just need the part of the account name that comes after the @ sign.)

    I’ll offer a bit of a clarification: We don’t just arbitrarily limit the amount of mail that can come from a given sender or ISP; we limit ISPs that appear to be sending large amounts of spam or that have unknown reputation. The good news is that we have clear guidelines for senders and ISP to overcome the limits to make sure email from their legitimate users gets through. You can check out the guidelines at http://postmaster.hotmail.com

    In any case, if you can let us know the ISP, we can take a look. Thanks.

  • dfd9880
    1 Posts

    I enjoyed this piece.  a clear description of the complexities of a world-wide deployed system - One I have used for many years.

    Also, I want to echo some of the last post.  Please don't mess up my hotmail account with too much innovation

  • Obviously spam is a major problem, but I think a recent development has gone badly wrong to the extent that I am going to have to leave hotmail.  It's the 'rate limit exceeded' problem and it regularly and seemingly randomly blocks emails from my partner.

    The most basic requirement of any system is that you can reliably send and receive emails from people you know and trust.  Hotmail no longer does this, and so I am going to have to dump it.

    The current Hotmail position is basically impractical, the whole sender id business is an absurd requirement for two people to be able to email each other.  

    I realise the spam issue is big and complicated and without 100% guarantee, but the balancing act must err on the side of caution, right now Hotmail is throwing the baby out with the bathwater, for a LOT of people.

    It's been happening for just a few months, whatever you changed, ditch it and fast.

    I like Hotmail, I have had this account for err.. a ... LONG time and I don't want to change.  I am being forced out though.

    Thanks for taking the time to blog this though, the time and effort is appreciated.

    Simon

  • Kit
    23 Posts

    Thanks for the reply, but I started to wonder why is the connection speed for files on Hotmail attachments as well as SkyDrive files take so long to download?

    I am getting 30~40kb/sec, not sure if the https did anything to it or the lack of CDN for personal contents.

  • Kit
    23 Posts

    @Eric Doerr

    Thanks for the reply, but I started to wonder why is the connection speed for files on Hotmail attachments as well as SkyDrive files take so long to download?

    I am getting 30~40kb/sec, not sure if the https did anything to it or the lack of CDN for personal contents.

  • @Kit - You got it exactly right. SkyDrive keeps 4 copies of your data in the cloud too, just like Hotmail. I'll do a post soon on some of the behind the scenes work we do in SkyDrive, but I wanted to respond to this to let you know. Thanks for using SkyDrive!

  • Kit
    23 Posts

    I wonder does the skydrive have the same four copies on the cloud or less? I have about 4GB of data on it, that means 16GB for Windows Live! That is pretty massive, really huge requirement on HDD spaces.

  • In looking through the comments, sounds like IMAP and calendar sync to mobile are coming across quite clearly.  For IMAP, which clients do you use to connect?  Is it mostly phones and Outlook or are there others?

  • Thanks for an excellent blog... will be looking forward to new updates.... my wish list is...

    The ability to create folders within folders in my inbox in hotmail(Live).... IMAP (Outlook conector is a bit unreliable).... The ability to sync my calender with my Win 6.5 Mobile device without having to use Outlook.

  • wow!!

    good work guys. i am looking forward for the 4th wave of updates to windows live. and hope it includes things like auto spell checking. larger attachments. and better contact management

  • sp1der
    35 Posts

    Features request:

    - Sync Over The Air between Live calendar and Mobile Calendar

    - Enable IMAP!!!!!!

  • aldo
    2 Posts

    Good information to know, very interesting :) I sure hope though you guys will work on your spam filter though, it used to be very bad, then it actually worked very well, but now it doesn't do too well. Like when I mark messages as spam, when I get a message of the like (or even the same) the spam filter still doesn't get it! :/

    Anyways, great work guys.

    Cheers!

  • Thanks for sharing the innner workings. I love Deltasync. I wish that the Hotmail team would make it possible to select folders which would not syncronize with a desktop client; with a 5GB storage, there's no need for all of it sync with Outlook or the Windows Live Mail client; also, we should be able to move messages between Inboxes for linked accounts.

  • In a world of shallow meaningless content it's a delight to find a serious real world blog.

    I appreciate this content.  It's meaningful.  Keep up the good work.

  • Darreno
    1 Posts

    Hi dude,

    Very good article for me even I am a junior .net developer but i take software maintenance and testing seriously. A systematic and consistent way of handling huge data from different users all around the world is fairly as important as the features hotmail offer for me. Keep going....cheers.

  • jamiet
    45 Posts

    stuartbennett, Islander

    Linking your accounts is available at account.live.com/ManageLinks.aspx

    You still need to switch accounts so t doesn't do *exactly* what stuartbennett asks for but its pretty damn close.

  • Islander
    46 Posts

    @stuartbennett: Linking your Hotmail accounts is already possible, right now I don't remember where exactly but it's definately somewhere within Account Settings.

  • feuture request for hotmail:

    -posibility to attach file larger than 20mb (till about 1gb) with auto support of skydrive...for me is simply now but other people would like to have an automation ;)

    -possibility that outlook 2010 will support natively live services like windows live mail and not with outlook connector that doesn't work always too well

    -insert in live menu sites in the voice more ALL live services!!!

    -possibility to ad friends profile direct in messenger without connect to website

  • feature request:

    i have 2 hotmail accounts, when i am at home i use windows live mail to get all my messages from both accounts plus my sky email account so they are all in one place but for example ill be round my sisters house over christmas meaning i got to sign into each account seperately, for sky vs hotmail i understand that cause there not from the same provider but with hotmail it seems totally stupid to have to sign out of 1 windows live id and into another just to check different hotmail accounts, why can you not just give each person a single windows live id on which they can have several mailboxes linked to it so you just expand the mailbox you want in the left hand side nav bar and viola your mails there.

    and yes being able to sort my mail by any criteria like i can in windows live mail and used to be able to do in msn hotmail would be great.

  • Manan
    4 Posts

    Proud to be a Hotmail user! I even say that in my signature! love the service, though the web UI could be made a bit more responsive :) And I'm looking forward to seeing integration with Office Web Apps :)

  • anonymuos
    87 Posts

    Feature requests:

    - Office Web Apps integration with Hotmail, so I can quickly open a Word document in Word web app

    - Mini To-do/tasks and calendar on the same page as email

    - IMAP IMAP IMAP IMAP IMAP!! Secure IMAP too.

    - Free forwarding to any other third party email accounts (so I am not forced to vendor lock-in)

    - Set number of messages per page (this used to be available back when it was MSN Hotmail)

    - Maximum attachment size at least 20 MB

    - Sort by any criteria in Ascending as well as Descending order (like it was available in MSN Hotmail)

    - Better experience of uploading/attaching multiple email attachments.

  • I don't mind DeltaSync so much but it has to become as reliable and fast as IMAP. So far, it doesn't look like that's happening even with the new Outlook Connector.

  • noroom
    9 Posts

    Aw, and I wanted to ask how hard it is to offer us IMAP, like GMail does (and probably many more too). Turns out, it might be harder than I thought.

    DeltaSync is proprietary, and I want to be able to check my email using any client on any platform. :(

    Please?