Magpie: online modelling and performance-aware systems

Authors:
Paul Barham;Rebecca Isaacs;Richard Mortier;Dushyanth Narayanan
Affiliations:
Microsoft Research Ltd., Cambridge, UK;Microsoft Research Ltd., Cambridge, UK;Microsoft Research Ltd., Cambridge, UK;Microsoft Research Ltd., Cambridge, UK
Venue:
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Year:
2003

Citing 19
Cited 36

Monitoring distributed systems

ACM Transactions on Computer Systems (TOCS)
Debugging heterogeneous distributed systems using event-based models of behavior

ACM Transactions on Computer Systems (TOCS)
Making paths explicit in the Scout operating system

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Eraser: a dynamic data race detector for multithreaded programs

ACM Transactions on Computer Systems (TOCS)
The Coign automatic distributed partitioning system

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Whole program paths

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Dynamically Discovering Likely Program Invariants to Support Program Evolution

IEEE Transactions on Software Engineering - Special issue on 1999 international conference on software engineering
Bugs as deviant behavior: a general approach to inferring errors in systems code

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
SEDA: an architecture for well-conditioned, scalable internet services

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Pinpoint: Problem Determination in Large, Dynamic Internet Services

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Coupled hidden Markov models for complex action recognition

CVPR '97 Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR '97)
Self-Monitoring and Self-Adapting Operating Systems

HOTOS '97 Proceedings of the 6th Workshop on Hot Topics in Operating Systems (HotOS-VI)
The Problems You're Having May Not Be the Problems You Think You're Having: Results from a Latency Study of Windows NT

HOTOS '99 Proceedings of the The Seventh Workshop on Hot Topics in Operating Systems
Using history to improve mobile application adaptation

WMCSA '00 Proceedings of the Third IEEE Workshop on Mobile Computing Systems and Applications (WMCSA'00)
HiFi: A New Monitoring Architecture for Distributed Systems Management

ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
An online evolutionary approach to developing internet services

EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
Using end-user latency to manage internet infrastructure

WIESS'02 Proceedings of the 2nd conference on Industrial Experiences with Systems Software - Volume 2
Measuring and characterizing system behavior using kernel-level event logging

ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Detours: binary interception of Win32 functions

WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3

IMPuLSE: integrated monitoring and profiling for large-scale environments

LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Deconstructing Commodity Storage Clusters

Proceedings of the 32nd annual international symposium on Computer Architecture
Combining statistical monitoring and predictable recovery for self-management

WOSS '04 Proceedings of the 1st ACM SIGSOFT workshop on Self-managed systems
Virtual private machines: user-centric performance

Proceedings of the 11th workshop on ACM SIGOPS European workshop
Utilification

Proceedings of the 11th workshop on ACM SIGOPS European workshop
Request extraction in Magpie: events, schemas and temporal joins

Proceedings of the 11th workshop on ACM SIGOPS European workshop
Problem diagnosis in large-scale computing environments

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Automated known problem diagnosis with event traces

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Mace: language support for building distributed systems

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Understanding and dealing with operator mistakes in internet services

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Correlating instrumentation data to system states: a building block for automated diagnosis and control

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using magpie for request extraction and workload modelling

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Failure Detection in Large-Scale Internet Services by Principal Subspace Mapping

IEEE Transactions on Knowledge and Data Engineering
Automated Rule-Based Diagnosis through a Distributed Monitor System

IEEE Transactions on Dependable and Secure Computing
Bridging the application and DBMS profiling divide for database application developers

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Monitoring High-Dimensional Data for Failure Detection and Localization in Large-Scale Computing Systems

IEEE Transactions on Knowledge and Data Engineering
Reliable framework for RFID devices

Proceedings of the 5th Middleware doctoral symposium
Diagnosing distributed systems with self-propelled instrumentation

Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware
System monitoring with metric-correlation models: problems and solutions

ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Suelo: human-assisted sensing for exploratory soil monitoring studies

Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems
Heteroscedastic models to track relationships between management metrics

IM'09 Proceedings of the 11th IFIP/IEEE international conference on Symposium on Integrated Network Management
Performance debugging in data centers: doing more with less

COMSNETS'09 Proceedings of the First international conference on COMmunication Systems And NETworks
PeerWatch: a fault detection and diagnosis tool for virtualized consolidation systems

Proceedings of the 7th international conference on Autonomic computing
CLUEBOX: a performance log analyzer for automated troubleshooting

WASL'08 Proceedings of the First USENIX conference on Analysis of system logs
Scoped identifiers for efficient bit aligned logging

Proceedings of the Conference on Design, Automation and Test in Europe
Automating configuration troubleshooting with dynamic information flow analysis

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Analyzing web logs to detect user-visible failures

SLAML'10 Proceedings of the 2010 workshop on Managing systems via log analysis and machine learning techniques
Root-cause analysis of performance anomalies in web-based applications

Proceedings of the 2011 ACM Symposium on Applied Computing
HiTune: dataflow-based performance analysis for big data cloud

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Modeling the parallel execution of black-box services

HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
HiTune: dataflow-based performance analysis for big data cloud

HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Measuring the dependability of web services for use in e-science experiments

ISAS'06 Proceedings of the Third international conference on Service Availability
DAPA: diagnosing application performance anomalies for virtualized infrastructures

Hot-ICE'12 Proceedings of the 2nd USENIX conference on Hot Topics in Management of Internet, Cloud, and Enterprise Networks and Services
Dealer: application-aware request splitting for interactive cloud applications

Proceedings of the 8th international conference on Emerging networking experiments and technologies
P4-simsaas: policy specification for Multi-Tendency simulation software-as-a-service model

Proceedings of the Winter Simulation Conference
Understanding latency variations of black box services

Proceedings of the 22nd international conference on World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Understanding the performance of distributed systems requires correlation of thousands of interactions between numerous components -- a task best left to a computer. Today's systems provide voluminous traces from each component but do not synthesise the data into concise models of system performance. We argue that online performance modelling should be a ubiquitous operating system service and outline several uses including performance debugging, capacity planning, system tuning and anomaly detection. We describe the Magpie modelling service which collates detailed traces from multiple machines in an e-commerce site, extracts request-specific audit trails, and constructs probabilistic models of request behaviour. A feasibility study evaluates the approach using an offline demonstrator. Results show that the approach is promising, but that there are many challenges to building a truly ubiquitious, online modelling infrastructure.