Evaluating three approaches to extracting fault data from software change repositories

Authors:
Tracy Hall;David Bowes;Gernot Liebchen;Paul Wernick
Affiliations:
Department of Information Systems & Computing, Brunel University, Uxbridge, Middlesex, UK;School of Computer Science, University of Hertfordshire, Hatfield, Hertfordshire, UK;Department of Information Systems & Computing, Brunel University, Uxbridge, Middlesex, UK;School of Computer Science, University of Hertfordshire, Hatfield, Hertfordshire, UK
Venue:
PROFES'10 Proceedings of the 11th international conference on Product-Focused Software Process Improvement
Year:
2010

Citing 11
Cited 0

Detection or Isolation of Defects? An Experimental Comparison of Unit Testing and Code Inspection

ISSRE '03 Proceedings of the 14th International Symposium on Software Reliability Engineering
Predicting the Location and Number of Faults in Large Software Systems

IEEE Transactions on Software Engineering
When do changes induce fixes?

MSR '05 Proceedings of the 2005 international workshop on Mining software repositories
Discriminative pattern mining in software fault detection

Proceedings of the 3rd international workshop on Software quality assurance
Where do bugs come from?

ACM SIGSOFT Software Engineering Notes
Statistical models vs. expert estimation for fault prediction in modified code - an industrial case study

Journal of Systems and Software
Predicting Defects for Eclipse

PROMISE '07 Proceedings of the Third International Workshop on Predictor Models in Software Engineering
Empirical Analysis of Software Fault Content and Fault Proneness Using Bayesian Methods

IEEE Transactions on Software Engineering
An empirical study of slice-based cohesion and coupling metrics

ACM Transactions on Software Engineering and Methodology (TOSEM)
Comparing methods to identify defect reports in a change management database

DEFECTS '08 Proceedings of the 2008 workshop on Defects in large software systems
Data mining source code for locating software bugs: A case study in telecommunication industry

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software products can only be improved if we have a good understanding of the faults they typically contain. Code faults are a significant source of software product problems which we currently do not understand sufficiently. Open source change repositories are potentially a rich and valuable source of fault data for both researchers and practitioners. Such fault data can be used to better understand current product problems so that we can predict and address future product problems. However extracting fault data from change repositories is difficult. In this paper we compare the performance of three approaches to extracting fault data from the change repository of the Barcode Open Source System. Our main findings are that we have most confidence in our manual evaluation of diffs to identify fault fixing changes. We had less confidence in the ability of the two automatic approaches to separate fault fixing from non-fault fixing changes. We conclude that it is very difficult to reliably extract fault fixing data from change repositories, especially using automatic tools and that we need to be cautious when reporting or using such data.