Assessing Uncertain Predictions of Software Quality

Authors:
Taghi M. Khoshgoftaar;Edward B. Allen;Xiaojing Yuan;Wendell D. Jones;John P. Hudepohl
Affiliations:
-;-;-;-;-
Venue:
METRICS '99 Proceedings of the 6th International Symposium on Software Metrics
Year:
1999

Citing 0
Cited 4

Controlling Overfitting in Classification-Tree Models ofSoftware Quality

Empirical Software Engineering
Uncertain Classification of Fault-Prone Software Modules

Empirical Software Engineering
Data Mining of Software Development Databases

Software Quality Control
Classification Tree Models of Software Quality Over Multiple Releases

ISSRE '99 Proceedings of the 10th International Symposium on Software Reliability Engineering

Quantified Score

Hi-index	0.01

Visualization

Abstract

Many development organizations try to minimize faults in software as a means for improving customer satisfaction. Assuring high software quality often entails time-consuming and costly development processes.A software quality model based on software metrics can be used to guide enhancement efforts by predicting which modules are fault-prone. This paper presents a way to determine which predictions by a classification tree should be considered uncertain.We conducted a case study of a large legacy telecommunications system. One release was the basis for the training data set, and the subsequent release was the basis for the evaluation data set. We built a classification tree using the treedisc algorithm, which is based on chi-squared tests of contingency tables.The model predicted whether a module was likely to have faults discovered by customers, or not, based on software product, process, and execution metrics. We simulated practical use of the model by classifying the modules in the evaluation data set. The model achieved useful accuracy, in spite of the very small proportion of fault-prone modules in the system.We assessed whether the classes assigned to the leaves were appropriate by examining the details of the full tree, and found sizable subsets of modules with substantially uncertain classification. Discovering which modules have uncertain classifications allows sophisticated enhancement strategies to resolve uncertainties. Moreover, treedisc is especially well-suited to identifying uncertain classifications.