Iterative identification of fault-prone binaries using in-process metrics

  • Authors:
  • Lucas Layman;Gunnar Kudrjavets;Nachiappan Nagappan

  • Affiliations:
  • North Carolina State University, Raleigh, NC, USA;Microsoft Corporation, Redmond, WA, USA;Microsoft Corporation, Redmond, WA, USA

  • Venue:
  • Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Code churn, the amount of code change taking place within a software unit over time, has been correlated with fault-proneness in software systems. We investigate the use of code churn and static metrics collected at regular time intervals during the development cycle to predict faults in an iterative, in-process manner. We collected 159 churn and structure metrics from six, four-month snapshots of a 1 million LOC Microsoft product. The number of software faults fixed during each period is recorded per binary module. Using stepwise logistic regression, we create a prediction model to identify fault-prone binaries using three parameters: code churn (the number of new and changed blocks); class Fan In and class Fan Out (normalized by lines of code). The iteratively-built model is 80.0% accurate at predicting fault-prone and non-fault-prone binaries. These fault-prediction models have the advantage of allowing the engineers to observe how their fault-prediction profile evolves over time.