An evaluation of redundant arrays of disks using an Amdahl 5890
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
MemSpy: analyzing memory system bottlenecks in programs
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Performance assertion checking
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Scheduling algorithms for modern disk drives
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
On-line extraction of SCSI disk drive parameters
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A study of integrated prefetching and caching strategies
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Using the SimOS machine simulator to study complex computer systems
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Informed multi-process prefetching and caching
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Cello: a disk scheduling framework for next generation operating systems
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
An analytic behavior model for disk drives with readahead caches and request reordering
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Modeling and optimizing I/O throughput of multiple disks on a bus (summary)
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Using System-Level Models to Evaluate I/O Subsystem Designs
IEEE Transactions on Computers
Bugs as deviant behavior: a general approach to inferring errors in systems code
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications
IEEE Transactions on Parallel and Distributed Systems
Selecting RAID Levels for Disk Arrays
FAST '02 Proceedings of the Conference on File and Storage Technologies
Mixtures of Rectangles: Interpretable Soft Clustering
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
A Modular, Analytical Throughput Model for Modern Disk Arrays
MASCOTS '01 Proceedings of the Ninth International Symposium in Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Automatic misconfiguration troubleshooting with peerpressure
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
CP-Miner: a tool for finding copy-paste and related bugs in operating system code
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Measuring and characterizing system behavior using kernel-level event logging
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Why does file system prefetching work?
ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
HOTDEP'06 Proceedings of the 2nd conference on Hot Topics in System Dependability - Volume 2
Managing prefetch memory for data-intensive online servers
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
DIADS: addressing the "my-problem-or-yours" syndrome with integrated SAN and database diagnosis
FAST '09 Proccedings of the 7th conference on File and storage technologies
Configuration-space performance anomaly depiction
LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
AdaptGuard: guarding adaptive systems from instability
ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Reference-driven performance anomaly identification
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
An Extensible I/O Performance Analysis Framework for Distributed Environments
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
SelfTalk for Dena: query language and runtime support for evaluating system behavior
ACM SIGOPS Operating Systems Review
Towards versatile performance models for complex, popular applications
ACM SIGMETRICS Performance Evaluation Review
Practical performance models for complex, popular applications
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A query language and runtime tool for evaluating behavior of multi-tier servers
Proceedings of the ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Adaptive system anomaly prediction for large-scale hosting infrastructures
Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
HotDep'06 Proceedings of the Second conference on Hot topics in system dependability
OS-level hang detection in complex software systems
International Journal of Critical Computer-Based Systems
Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
Hi-index | 0.00 |
It is challenging to identify performance problems and pinpoint their root causes in complex systems, especially when the system supports wide ranges of workloads and when performance problems only materialize under particular workload conditions. This paper proposes a model-driven anomaly characterization approach and uses it to discover operating system performance bugs when supporting disk I/O-intensive online servers. We construct a whole-system I/O throughput model as the reference of expected performance and we use statistical clustering and characterization of performance anomalies to guide debugging. Unlike previous performance debugging methods offering detailed statistics at specific execution settings, our approach focuses on comprehensive anomaly characterization over wide ranges of workload conditions and system configurations. Our approach helps us quickly identify four performance bugs in the I/O system of the recent Linux 2.6.10 kernel (one in the file system prefetching, two in the anticipatory I/O scheduler, and one in the elevator I/O scheduler). Our experiments with two Web server benchmarks, a trace-driven index searching server, and the TPC-C database benchmark show that the corrected kernel improves system throughput by up to five-fold compared with the original kernel (averaging 6%, 32%, 39%, and 16% for the four server workloads).