Automating algorithms for the identification of fault-prone files

  • Authors:
  • Thomas J. Ostrand;Elaine J. Weyuker;Robert M. Bell

  • Affiliations:
  • AT&T Labs;AT&T Labs;AT&T Labs

  • Venue:
  • Proceedings of the 2007 international symposium on Software testing and analysis
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This research investigates ways of predicting which files would be most likely to contain large numbers of faults in the next release of a large industrial software system. Previous work involved making predictions using several different models ranging from a simple, fully-automatable model (the LOC model) to several different variants of a negative binomial regression model that were customized for the particular software system under study. Not surprisingly, the custom models invariably predicted faults more accurately than the simple model. However, development of customized models requires substantial time and analytic effort, as well as statistical expertise. We now introduce new, more sophisticated models that yield more accurate predictions than the earlier LOC model, but which nonetheless can be fully automated. We also extend our earlier research by presenting another large-scale empirical study of the value of these prediction models, using a new industrial software system over a nine year period.