Does measuring code change improve fault prediction?

  • Authors:
  • Robert M. Bell;Thomas J. Ostrand;Elaine J. Weyuker

  • Affiliations:
  • AT&T Labs - Research, Florham Park, NJ;AT&T Labs - Research, Florham Park, NJ;AT&T Labs - Research, Florham Park, NJ

  • Venue:
  • Proceedings of the 7th International Conference on Predictive Models in Software Engineering
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

Background: Several studies have examined code churn as a variable for predicting faults in large software systems. High churn is usually associated with more faults appearing in code that has been changed frequently. Aims: We investigate the extent to which faults can be predicted by the degree of churn alone, whether other code characteristics occur together with churn, and which combinations of churn and other characteristics provide the best predictions. We also investigate different types of churn, including both additions to and deletions from code, as well as overall amount of change to code. Method: We have mined the version control database of a large software system to collect churn and other software measures from 18 successive releases of the system. We examine the frequency of faults plotted against various code characteristics, and evaluate a diverse set of prediction models based on many different combinations of independent variables, including both absolute and relative churn. Results: Churn measures based on counts of lines added, deleted, and modified are very effective for fault prediction. Individually, counts of adds and modifications outperform counts of deletes, while the sum of all three counts was most effective. However, these counts did not improve prediction accuracy relative to a model that included a simple count of the number of times that a file had been changed in the prior release. Conclusions: Including a measure of change in the prior release is an essential component of our fault prediction method. Various measures seem to work roughly equivalently.