Bug isolation via remote program sampling
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Scalable statistical bug isolation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Empirical evaluation of the tarantula automatic fault-localization technique
Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering
Statistical debugging: simultaneous identification of multiple bugs
ICML '06 Proceedings of the 23rd international conference on Machine learning
Problem diagnosis in large-scale computing environments
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Statistical Debugging: A Hypothesis Testing-Based Approach
IEEE Transactions on Software Engineering
Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A regression-based approach to scalability prediction
Proceedings of the 22nd annual international conference on Supercomputing
Lessons learned at 208K: towards debugging millions of cores
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Statistical Debugging Using Latent Topic Models
ECML '07 Proceedings of the 18th European conference on Machine Learning
HOLMES: Effective statistical debugging via efficient path profiling
ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Breadcrumbs: efficient context sensitivity for dynamic bug detection analyses
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Vrisha: using scaling properties of parallel programs for bug detection and localization
Proceedings of the 20th international symposium on High performance distributed computing
ABHRANTA: locating bugs that manifest at large system scales
HotDep'12 Proceedings of the Eighth USENIX conference on Hot Topics in System Dependability
Hi-index | 0.00 |
A key challenge in developing large scale applications is finding bugs that are latent at the small scales of testing, but manifest themselves when the application is deployed at a large scale. Here, we ascribe a dual meaning to "large scale"---it could mean a large number of executing processes or applications ingesting large amounts of input data (or both). Traditional statistical techniques fail to detect or diagnose such kinds of bugs because no error-free run is available at the large deployment scales for training purposes. Prior work used scaling models to detect anomalous behavior at large scales without training on correct behavior at that scale. However, that work cannot localize bugs automatically, i.e. cannot pinpoint the region of code responsible for the error. In this paper, we resolve that shortcoming by making the following three contributions: (i) we develop an automatic diagnosis technique, based on feature reconstruction; (ii) we design a heuristic to effectively prune the large feature space; and (iii) we demonstrate that our system scales well, in terms of both accuracy and overhead. We validate our design through a large-scale fault-injection study and two case-studies of real-world bugs, finding that our system can effectively localize bugs in 92.5% of the cases, dramatically easing the challenge of finding bugs in large-scale programs.