LiveOps: systems management as a service

  • Authors:
  • Chad Verbowski;Juhan Lee;Xiaogang Liu;Roussi Roussev;Yi-Min Wang

  • Affiliations:
  • Microsoft Research;Microsoft MSN;Microsoft MSN;Florida Institute of Technology;Microsoft Research

  • Venue:
  • LISA '06 Proceedings of the 20th conference on Large Installation System Administration
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Existing Management Systems do not detect the most time-consuming and technically difficult anomalies administrators encounter. Oppenheimer found that 33% of outages were caused by human error and that 76% of the time taken to resolve an outage was taken by humans determining what change was needed. Defining anomaly detection rules is challenging and often cannot be shared across organizations. It requires a deep combined knowledge of the software, workload, system configuration, and tuning parameters specific to the workload and overall distributed application topology. We present LiveOps, a scalable systems and security management service based on auditing the interactions between applications and the persistent state they use. This approach simplifies identifying security vulnerabilities, performs compliance auditing, enables forensic investigations, detects patching problems, optimizes troubleshooting, and detects malware/ intrusions. The service enables knowledge sharing across organizations and administrative boundaries and allows for seamless integration between analysis results from disparate management products that build on it. Our configuration-free agent collects all read and write access to registry entries, files, binaries, and process creation. The agents streaming lossless compression creates log files of only 20 MB per day containing an average of 45 million events. The scalable LiveOps back-end service can analyze 1000 machine days of logs in 30 minutes. LiveOps agents have been deployed on 1149 machines from home systems to corporate desktops, including 381 production MSN servers across 11 sites.