Concept-based failure clustering

Authors:
Nicholas DiGiuseppe;James A. Jones
Affiliations:
University of California, Irvine;University of California, Irvine
Venue:
Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering
Year:
2012

Citing 12
Cited 1

Estimation of software reliability by stratified sampling

ACM Transactions on Software Engineering and Methodology (TOSEM)
Finding failures by cluster analysis of execution profiles

ICSE '01 Proceedings of the 23rd International Conference on Software Engineering
Pursuing failure: the distribution of program failures in a profile space

Proceedings of the 8th European software engineering conference held jointly with 9th ACM SIGSOFT international symposium on Foundations of software engineering
Automated support for classifying software failure reports

Proceedings of the 25th International Conference on Software Engineering
Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact

Empirical Software Engineering
Failure proximity: a fault localization-based approach

Proceedings of the 14th ACM SIGSOFT international symposium on Foundations of software engineering
Semantic clustering: Identifying topics in source code

Information and Software Technology
Debugging in Parallel

Proceedings of the 2007 international symposium on Software testing and analysis
Statistical Debugging Using Latent Topic Models

ECML '07 Proceedings of the 18th European conference on Machine Learning
Mining source code to automatically split identifiers for software analysis

MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Bug localization using latent Dirichlet allocation

Information and Software Technology
Software Behavior and Failure Clustering: An Empirical Study of Fault Causality

ICST '12 Proceedings of the 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation

Automatically describing software faults

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

When attempting to determine the number and set of execution failures that are caused by particular faults, developers must perform an arduous task of investigating and diagnosing each individual failure. Researchers proposed failure-clustering techniques to automatically categorize failures, with the intention of isolating each culpable fault. The current techniques utilize dynamic control flow to characterize each failure to then cluster them. These existing techniques, however, are blind to the intent or purpose of each execution, other than what can be inferred by the control-flow profile. We hypothesize that semantically rich execution information can aid clustering effectiveness by categorizing failures according to which functionality they exhibit in the software. This paper presents a novel clustering method that utilizes latent-semantic-analysis techniques to categorize each failure by the semantic concepts that are expressed in the executed source code. We present an experiment comparing this new technique to traditional control-flow-based clustering. The results of the experiment showed that the semantic-concept clustering was more precise in the number of clusters produced than the traditional approach, without sacrificing cluster accuracy.