Statistical models vs. expert estimation for fault prediction in modified code - an industrial case study

  • Authors:
  • Piotr Tomaszewski;Jim Håkansson;Håkan Grahn;Lars Lundberg

  • Affiliations:
  • Department of Systems and Software Engineering, School of Engineering, Blekinge Institute of Technology, SE-372 25 Ronneby, Sweden;Department of Systems and Software Engineering, School of Engineering, Blekinge Institute of Technology, SE-372 25 Ronneby, Sweden;Department of Systems and Software Engineering, School of Engineering, Blekinge Institute of Technology, SE-372 25 Ronneby, Sweden;Department of Systems and Software Engineering, School of Engineering, Blekinge Institute of Technology, SE-372 25 Ronneby, Sweden

  • Venue:
  • Journal of Systems and Software
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Statistical fault prediction models and expert estimations are two popular methods for deciding where to focus the fault detection efforts when the fault detection budget is limited. In this paper, we present a study in which we empirically compare the accuracy of fault prediction offered by statistical prediction models with the accuracy of expert estimations. The study is performed in an industrial setting. We invited eleven experts that are involved in the development of two large telecommunication systems. Our statistical prediction models are built on historical data describing one release of one of those systems. We compare the performance of these statistical fault prediction models with the performance of our experts when predicting faults in the latest releases of both systems. We show that the statistical methods clearly outperform the expert estimations. As the main reason for the superiority of the statistical models we see their ability to cope with large datasets. This makes it possible for statistical models to perform reliable predictions for all components in the system. This also enables prediction at a more fine-grain level, e.g., at the class instead of at the component level. We show that such a prediction is better both from the theoretical and from the practical perspective.