Balancing Misclassification Rates in Classification-TreeModels of Software Quality

  • Authors:
  • Taghi M. Khoshgoftaar;Xiaojing Yuan;Edward B. Allen

  • Affiliations:
  • Florida Atlantic University, Boca Raton, Florida USA;Florida Atlantic University, Boca Raton, Florida USA;Mississippi State University, Mississippi USA

  • Venue:
  • Empirical Software Engineering
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Software product and process metrics can be useful predictorsof which modules are likely to have faults during operations.Developers and managers can use such predictions by softwarequality models to focus enhancement efforts before release.However, in practice, software quality modeling methods in theliterature may not produce a useful balance between the two kindsof misclassification rates, especially when there are few faultymodules.This paper presents a practical classificationrule in the context of classification tree models that allowsappropriate emphasis on each type of misclassification accordingto the needs of the project. This is especially important whenthe faulty modules are rare.An industrial case study using classification trees, illustrates the tradeoffs.The trees were built using the TREEDISC algorithm whichis a refinement of the CHAID algorithm. We examinedtwo releases of a very large telecommunications system, and builtmodels suited to two points in the development life cycle: theend of coding and the end of beta testing. Both trees had onlyfive significant predictors, out of 28 and 42 candidates, respectively.We interpreted the structure of the classification trees, andwe found the models had useful accuracy.