Techniques for Classifying Executions of Deployed Software to Support Software Engineering Tasks

Authors:
Murali Haran;Alan Karr;Michael Last;Alessandro Orso;Adam A. Porter;Ashish Sanil;Sandro Fouche
Affiliations:
-;-;-;IEEE;IEEE;-;IEEE
Venue:
IEEE Transactions on Software Engineering
Year:
2007

Citing 30
Cited 13

The Mahler experience: using an intermediate language as the machine description

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Link-time optimization of address calculation on a 64-bit architecture

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
EEL: machine-independent executable editing

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Continuous profiling: where have all the cycles gone?

ACM Transactions on Computer Systems (TOCS)
Residual test coverage monitoring

Proceedings of the 21st international conference on Software engineering
Multivariate visualization in observation-based testing

Proceedings of the 22nd international conference on Software engineering
Extracting usability information from user interface events

ACM Computing Surveys (CSUR)
Mining needle in a haystack: classifying rare classes via two-phase rule induction

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A framework for reducing the cost of instrumented code

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Finding failures by cluster analysis of execution profiles

ICSE '01 Proceedings of the 23rd International Conference on Software Engineering
Pursuing failure: the distribution of program failures in a profile space

Proceedings of the 8th European software engineering conference held jointly with 9th ACM SIGSOFT international symposium on Foundations of software engineering
Random Forests

Machine Learning
Monitoring deployed software using software tomography

Proceedings of the 2002 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
The Paradyn Parallel Performance Measurement Tool

Computer
AdaCost: Misclassification Cost-Sensitive Boosting

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Visualization of program-execution data for deployed software

Proceedings of the 2003 ACM symposium on Software visualization
Automated support for classifying software failure reports

Proceedings of the 25th International Conference on Software Engineering
Bug isolation via remote program sampling

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Leveraging field data for impact analysis and regression testing

Proceedings of the 9th European software engineering conference held jointly with 11th ACM SIGSOFT international symposium on Foundations of software engineering
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Covering arrays for efficient fault characterization in complex configuration spaces

ISSTA '04 Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
Active learning for automatic classification of software behavior

ISSTA '04 Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis
Tree-Based Methods for Classifying Software Failures

ISSRE '04 Proceedings of the 15th International Symposium on Software Reliability Engineering
Scalable statistical bug isolation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Profiling Deployed Software: Assessing Strategies and Testing Opportunities

IEEE Transactions on Software Engineering
Applying classification techniques to remotely-collected program execution data

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Selective capture and replay of program executions

WODA '05 Proceedings of the third international workshop on Dynamic analysis
Instrumentation and optimization of Win32/intel executables using Etch

NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997

Statistical debugging using compound boolean predicates

Proceedings of the 2007 international symposium on Software testing and analysis
Analysis of a deployed software

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Analysis of a deployed software

The 6th Joint Meeting on European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering: companion papers
Mining Bug Classifier and Debug Strategy Association Rules for Web-Based Applications

ADMA '08 Proceedings of the 4th international conference on Advanced Data Mining and Applications
A Design Science Research Methodology for Information Systems Research

Journal of Management Information Systems
Using machine learning to refine Category-Partition test specifications and test suites

Information and Software Technology
Adaptive bug isolation

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
F007: finding rediscovered faults from the field using function-level failed traces of software in the field

Proceedings of the 2010 Conference of the Center for Advanced Studies on Collaborative Research
Diagnosing new faults using mutants and prior faults (NIER track)

Proceedings of the 33rd International Conference on Software Engineering
iTree: efficiently discovering high-coverage configurations using interaction trees

Proceedings of the 34th International Conference on Software Engineering
How much does unused code matter for maintenance?

Proceedings of the 34th International Conference on Software Engineering
Multi-label software behavior learning

Proceedings of the 34th International Conference on Software Engineering
An empirical study on the use of mutant traces for diagnosis of faults in deployed systems

Journal of Systems and Software

Quantified Score

Hi-index	0.01

Visualization

Abstract

There is an increasing interest in techniques that support analysis and measurement of fielded software systems. These techniques typically deploy numerous instrumented instances of a software system, collect execution data when the instances run in the field, and analyze the remotely collected data to better understand the system's in-the-field behavior. One common need for these techniques is the ability to distinguish execution outcomes (e.g., to collect only data corresponding to some behavior or to determine how often and under which condition a specific behavior occurs). Most current approaches, however, do not perform any kind of classification of remote executions and either focus on easily observable behaviors (e.g., crashes) or assume that outcomes' classifications are externally provided (e.g., by the users). To address the limitations of existing approaches, we have developed three techniques for automatically classifying execution data as belonging to one of several classes. In this paper, we introduce our techniques and apply them to the binary classification of passing and failing behaviors. Our three techniques impose different overheads on program instances and, thus, each is appropriate for different application scenarios. We performed several empirical studies to evaluate and refine our techniques and to investigate the trade-offs among them. Our results show that 1) the first technique can build very accurate models, but requires a complete set of execution data; 2) the second technique produces slightly less accurate models, but needs only a small fraction of the total execution data; and 3) the third technique allows for even further cost reductions by building the models incrementally, but requires some sequential ordering of the software instances' instrumentation.