On designing and deploying internet-scale services

  • Authors:
  • James Hamilton

  • Affiliations:
  • Windows Live Services Platform

  • Venue:
  • LISA'07 Proceedings of the 21st conference on Large Installation System Administration Conference
  • Year:
  • 2007

Quantified Score

Hi-index 0.02

Visualization

Abstract

The system-to-administrator ratio is commonly used as a rough metric to understand administrative costs in high-scale services. With smaller, less automated services this ratio can be as low as 2:1, whereas on industry leading, highly automated services, we've seen ratios as high as 2, 500:1. Within Microsoft services, Autopilot [1] is often cited as the magic behind the success of the Windows Live Search team in achieving high system-to-administrator ratios. While autoadministration is important, the most important factor is actually the service itself. Is the service efficient to automate? Is it what we refer to more generally as operations-friendly? Services that are operations-friendly require little human intervention, and both detect and recover from all but the most obscure failures without administrative intervention. This paper summarizes the best practices accumulated over many years in scaling some of the largest services at MSN and Windows Live.