Characterizing the scalability of a large web-based shopping system
ACM Transactions on Internet Technology (TOIT)
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Eigenspace-based anomaly detection in computer systems
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Aberrant Behavior Detection in Time Series for Network Monitoring
LISA '00 Proceedings of the 14th USENIX conference on System administration
Queueing Networks and Markov Chains
Queueing Networks and Markov Chains
Subgradient and sampling algorithms for l1 regression
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Capturing, indexing, clustering, and retrieving system history
Proceedings of the twentieth ACM symposium on Operating systems principles
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Path-based faliure and evolution management
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Performance modeling and system management for multi-component online services
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Microreboot — A technique for cheap recovery
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Using magpie for request extraction and workload modelling
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Detecting application-level failures in component-based Internet services
IEEE Transactions on Neural Networks
A Synthetic Workload Generation Technique for Stress Testing Session-Based Systems
IEEE Transactions on Software Engineering
HOTDEP'06 Proceedings of the 2nd conference on Hot Topics in System Dependability - Volume 2
Exploiting nonstationarity for performance prediction
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Anomaly detection and diagnosis in grid environments
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Models and framework for supporting runtime decisions in Web-based systems
ACM Transactions on the Web (TWEB)
Modeling and exploiting query interactions in database systems
Proceedings of the 17th ACM conference on Information and knowledge management
Log summarization and anomaly detection for troubleshooting distributed systems
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Proceedings of the ACM/IFIP/USENIX 2007 International Conference on Middleware
MIDDLEWARE2007 Proceedings of the 8th ACM/IFIP/USENIX international conference on Middleware
Performance models oriented to the dynamic resource provisioning in shared data centres
ICACT'10 Proceedings of the 12th international conference on Advanced communication technology
Predicting completion times of batch query workloads using interaction-aware models and simulation
Proceedings of the 14th International Conference on Extending Database Technology
HotDep'06 Proceedings of the Second conference on Hot topics in system dependability
Interaction-aware scheduling of report-generation workloads
The VLDB Journal — The International Journal on Very Large Data Bases
Separating Performance Anomalies from Workload-Explained Failures in Streaming Servers
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Towards building performance models for data-intensive workloads in public clouds
Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
Indirect estimation of service demands in the presence of structural changes
Performance Evaluation
Hi-index | 0.00 |
Understanding real, large distributed systems can be as difficult and important as building them. Complex modern applications that span geographic and organizational boundaries confound performance analysis in challenging new ways. These systems clearly demand new analytic methods, but we are wary of approaches that suffer from the same problems as the systems themselves (e.g., complexity and opacity). This paper shows how to obtain valuable insight into the performance of globally-distributed applications without abstruse techniques or detailed application knowledge: Simple queueing-theoretic observations together with standard optimization methods yield remarkably accurate performance models. The models can be used for performance anomaly detection, i.e., distinguishing performance faults from mere overload. This distinction can in turn suggest both performance debugging tools and remedial measures. Extensive empirical results from three production systems serving real customers--two of which are globally distributed and span administrative domains-- demonstrate that our method yields accurate performance models of diverse applications. Our method furthermore flagged as anomalous an episode of a real performance bug in one of the three systems.