Select a language to translate this page!
Powered by Microsoft® Translator
Ever wonder how Hotmail stores the billions of email messages we receive each day? Keeping our customers’ data safe and readily available is an immense responsibility that we take very seriously. And to do so efficiently at our scale is a sizeable engineering challenge. This post will discuss how we address some of these challenges and reveal some major improvements we’re making in our storage system. Kristof Roomp is an architect in the Hotmail team and has been working on our storage system for the last 6 years.
Hotmail’s storage system supports over one billion mailboxes and hundreds of petabytes of data (one petabyte is a million gigabytes, or a million billion bytes). The system services hundreds of thousands of simultaneous transactions from across the world. Just like the rest of Hotmail, our storage system is built using Microsoft technology, including Windows Server and Microsoft SQL Server. These systems are the backbone of Hotmail and are crucial to meeting the high standards we’ve set for the reliability and availability of our service.
The folks who work on Hotmail storage have three main goals: keeping your emails safe, providing new functionality to the Hotmail service, and running the service as efficiently as possible. In many cases, safety and efficiency go together. For example, by automating routine maintenance tasks and providing monitoring to detect problems before they appear to our users, we can reduce the chance of human error and thereby significantly improve the reliability of our service.
Recently, we’ve been working on a major upgrade to our storage system. Starting at the beginning of this year, we’ve been running the new system on a pilot cluster, using personal accounts of Microsoft employees who have volunteered to be test pilots. We’ve now finished certifying this new system, and are satisfied that it provides better reliability to users at a significantly lower price.
I’ll describe some of the key technologies that we have developed at Hotmail to make this happen.
First, what is RAID?
RAID (Redundant Array of Inexpensive Disks) is a technology that allows several hard drives to be attached to a single controller board, which makes them look like a single larger and much more reliable hard drive (sometimes called a “Logical Unit”) to the software running the storage system. A RAID system stores data on multiple drives so that if a single drive fails, the data can be automatically recovered. Although this sounds great in theory, in practice losing an entire RAID set happens all the time, especially if you have thousands of machines.
In Hotmail, we’ve been using RAID for a long time. In order to avoid losing email messages when a RAID set fails, we keep your email on multiple RAID groups, so that even if an entire RAID set breaks, we can still restore your messages.
However, as we looked at deploying drives with capacity greater than a terabyte, we realized that we weren’t getting our money’s worth from a reliability perspective. The reason had to do with the idea of “correlated” as opposed to “independent” failures.
As an analogy, think about engines on an airplane: there are many failures (such as mechanical problems) that only affect a single engine. These are called independent failures, and having more than one engine is helpful in these situations. However, if you were to run into a big flock of birds or run out of fuel, all engines could fail at the same time. These are called correlated failures, since a single event causes multiple failures.
In a similar way, RAID systems can easily deal with problems that affect single (or two in some configurations) hard drives, but they don’t help if the whole machine or the RAID controller runs into problems. For larger drives, we found that having completely independent copies (on hard drives not sharing the same machine or controller) was much more reliable than a significantly more expensive RAID configuration.
The new system ensures that the copies of data reside on independent hard drives, controllers, and machines. This kind of system is nicknamed “JBOD,” which stands for “Just a Bunch Of Disks.” In a JBOD system, the hard drive controller almost completely gets out of the way, which means that the software must now worry about all the failures that the controller previously handled. These failures can range from firmware bugs on the hard drives themselves to issues such as “unrecoverable read errors” that previously were automatically fixed by the controllers. In addition, the software must now scrub the drives periodically to check the data for “bit rot” (i.e., data that has for some reason become unreadable or corrupt). So basically, we built a distributed "RAID" controller completely in software, which replaces the industry-standard firmware ones.
The software we developed for the JBOD system monitors the hard drives schedules repair actions, detects failures, and diagnoses repairs. This software consists of a number of “watchdogs” that constantly monitor for certain types of failures. If the watchdog detects the failure that it is looking for, it raises an alert, which automatically triggers a repair process. This repair process can range from rebooting a machine or restarting a process, to fixing data corruption or even involving a human if progress can’t be made. We'll talk more about our advanced platform for monitoring, deployment, and repair in a subsequent post.
A big advantage of managing the drives in software is that the system knows exactly how many good copies of an email message we have. In the case where it finds that there are too few copies, it can prioritize repair actions to avoid a potentially dangerous situation. In situations where repairs are taking too long, it is possible to move data to another location altogether. This is also possible in RAID in a limited fashion, but it requires that every RAID controller has an extra spare drive hooked up to it, which increases costs significantly.
Building our own distributed system to store replicated email messages was a significant development effort, although the replication itself was simplified by the fact that email messages in Hotmail stay exactly the same as they were when they were delivered (in fact you can see exactly what is stored in Hotmail if you do a “View message source”). Data about email messages that changes (such as read/unread, location in a folder, etc) is stored separately.
The storage system consists of a set of machines, each of which has its copy of an email message and a journal recording messages that have arrived, organized by arrival date. The machines talk to each other from time to time, compare their journals, and copy any messages that they realize haven’t been copied to all machines. This can happen for a variety of reasons, mostly due to machine, network, or hard drive failures. In some cases, the journals are too far out of sync, in which case the system does a full comparison/copy.
Although hard drives have gotten bigger and cheaper, the speed at which they can retrieve data hasn’t changed much. This means that although we can pack more data on larger hard drives, the hard drives would eventually be unable to handle the rate of requests.
One technology that is promising in this area is Flash Storage (also called SSD, or Solid State Drive). SSDs use technology similar to what you'd find on an SD card or USB stick, but with a faster internal chipset and a much longer lifespan. A normal hard drive can perform a little more than one hundred read/write operations per second, whereas some of the fastest SSDs can do over one hundred thousand operations per second. However, this comes at a hefty price, as these devices are 10 to 100 times more expensive than hard drives when you look at what you pay per gigabyte of storage.
To explain how SSDs could help us, I’ll first describe how Hotmail stores your mailbox. In addition to storing the email messages themselves, we also track information about these messages (called metadata), such as the list of messages in your inbox, read/unread status of your messages, conversation threading, mobile phone synchronization etc. This metadata takes up an extremely small fraction of our total storage space, but due to its constantly changing nature, it is responsible for most of the load on our hard drives.
By using SSDs for this small and rapidly changing set of data, and using the largest hard drives available for storing messages, we are able to take advantage of the trend in larger and cheaper hard drives without making any sacrifices in the performance of our system.
What happens if your account is still on one of our older machines? Well, don’t worry, since the older systems run on smaller hard drives, there are more than enough disk operations/sec available to handle your inbox.
We’re extremely excited about our new storage system. The rollout has already begun and all new clusters that we deploy going forward will use JBOD. We will also retrofit JBOD to our existing systems over time. We have about 30 million users on JBOD today, with another 100 million moving to the system over the next couple months.
Our team is already planning and doing early design work for the next set of innovations, which will include hardware architecture changes and low-level software improvements to further increase the efficiency of our storage. We’re looking at patterns of email content and how our users access their data to inform our future designs.
These advancements will ensure that we can scale our service as we continue to expand our features for organizing your inbox, making you more productive, protecting you from spam, and providing you with the fastest, most reliable email service on the planet. Thanks for using Hotmail.
Fix Live movie maker please. That program is horrendous and needs a overhaul.
@lamarcheb. Thanks for the vote of confidence! We actually do expose tasks via ActiveSync. It's up to the various mobile clients to decide if they want to support a task management feature, though. The next version of Windows Phone WILL support task sync, for example. As for your issues with Outlook Connector, I'll reach out to you 1:1 to troubleshoot your issue. Thanks for being a dedicated supporter.
Excellent article. Thanks for sharing. I've given up on Yahoo! and Gmail for regular use with all the recent improvements. One day soon I hope Microsoft will allow the syncronization of tasks over Exchange ActiveSync, and improve the reliability of the Outlook Connector (which seems to have issues with recurring calendar data).
2011 and Hotmail is just moving to JBOD? Thats funny, because Microsoft Exchange has been enabling people for JBOD since 2009 and has tens of millions of users on JBOD today in both enterprise and cloud models. And can do all of that with cheap storage, no need for SSD drives.
I think Azure vs Server is matter of choice. In projects like Hotmail, you need all levels of access to the OS. And of course, they can get all level of access with Azure as well!
I believe, Azure encapsulates all the features present in Server and it's not the other way around.
Not using Azure/SQL Azure then? Doesn't exactly enhance Azure's credibility if (one of) Microsoft's biggest web property(ies) doesn't use it, does it?
Are you using any of the Azure stuff for hotmail? If not, it would be interesting to learn why.
Chris very interesting. While i admit this is some cool development i don't understand why everyone at Microsoft isn't dog fooding azure. If azure wouldn't meet your needs for whatever reason maybe you could have worked with the azure team to break through the technological hurdles which not only would benefit hotmail but everyone that uses azure. I just think its the direction every team at MS should go.
Thanks for your work!
So does MIcrosoft use a hadoop type distributed file system now? I'm surprised you would have held on to expensive san / raid infrastructure so long!
@gorzko sorry to hear you are having activesync issues. let me know your email address through email or twitter (@ryanburk) and I can take a look.
@Chirs Jhones, thanks for the prompt reply and accepting the friend's invitation. I have sent you the details. I love the evolution process going on with WindowsLive, especially the SkyDrive! :3 )
Better Hotmail? Is it possible? I love the changes!
"meeting the high standards we’ve set for the reliability and availability of our service"
"providing you with the fastest, most reliable email service on the planet"
Well, sounds great, but Hotmail sync (EAS) on my mobile phone just stopped working properly two days ago and it doesn't want to start working properly again. Planet? Well, you don't seem to be the most reliable in the town...
Love reading this kind of technical detail articles rather than PR stuff, keep up the great work!
@Chris Jones No, thank you for providing Hotmail for me to use. And including Contacts and Calendar and having it use Exchange Activesync. Used to just use it as a secondary e-mail account, but since getting a WindowsPhone 7, it has become my primary PIM. Didn't realize what I was missing.
@abm I sent you a follow on email to track down your issues. Thanks for using Hotmail!
Interesting! But I have two issues with Hotmail.
1. Can you explain why the SmartScreen filter show me warning on the emails received from various reliable sources "including" Microsoft's own family websites (MSAnswers and WindowsTeamBlog) even I have MSAnswers' emailID in my Safe List! windowslivehelp.com/thread.aspx
Is the email content ill-formatted or is it SmartScreen?
2. Sweep feature got a checkbox for "Also block future messages", which doesn't work as per its intended meanings: windowslivehelp.com/thread.aspx ?