Classification of Fault-Prone Software Modules: Prior Probabilities,Costs, and Model Evaluation

  • Authors:
  • Taghi M. Khoshgoftaar;Edward B. Allen

  • Affiliations:
  • Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL 33431;Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL 33431

  • Venue:
  • Empirical Software Engineering
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

Softwarequality models can give timely predictions of reliability indicators,for targeting software improvement efforts. In some cases, classificationtechniques are sufficient for useful software quality models.The software engineeringcommunity has not applied informed prior probabilities widelyto software quality classification modeling studies. Moreover,even though costs are of paramount concern to software managers,costs of misclassification have received little attention inthe software engineering literature. This paper applies informedprior probabilities and costs of misclassification to softwarequality classification. We also discuss the advantages and limitationsof several statistical methods for evaluating the accuracy ofsoftware quality classification models.We conducted two full-scale industrial case studies which integratedthese concepts with nonparametric discriminant analysis to illustratehow they can be used by a classification technique. The casestudies supported our hypothesis that classification models ofsoftware quality can benefit by considering informed prior probabilitiesand by minimizing the expected cost of misclassifications. Thecase studies also illustrated the advantages and limitationsof resubstitution, cross-validation, and data splitting for modelevaluation.