Uncertain Classification of Fault-Prone Software Modules

Authors:
Taghi M. Khoshgoftaar;Xiaojing Yuan;Edward B. Allen;Wendell D. Jones;John P. Hudepohl
Affiliations:
Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA/ taghi@cse.fau.edu;Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL 33431, USA/ xyuan@cse.fau.edu;Mississippi State University, Mississippi, USA/ edward.allen@computer.org;Nortel Networks, Research Triangle Park, North Carolina, USA/ wjones@asciences.com;Nortel Networks, Research Triangle Park, North Carolina, USA/ hudepohl@nortelnetworks.com
Venue:
Empirical Software Engineering
Year:
2002

Citing 22
Cited 5

Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis

IEEE Transactions on Software Engineering - Special Issue on Artificial Intelligence in Software Applications
Developing Interpretable Models with Optimized set Reduction for Identifying High-Risk Software Components

IEEE Transactions on Software Engineering - Special issue on software reliability
A neural network approach for early detection of program modules having high risk in the maintenance phase

Selected papers of the sixth annual Oregon workshop on Software metrics
System acquisition based on software product assessment

Proceedings of the 18th international conference on Software engineering
A Validation of Object-Oriented Design Metrics as Quality Indicators

IEEE Transactions on Software Engineering
Which software modules have faults which will be discovered by customers?

Journal of Software Maintenance: Research and Practice
Controlling Overfitting in Classification-Tree Models ofSoftware Quality

Empirical Software Engineering
Balancing Misclassification Rates in Classification-TreeModels of Software Quality

Empirical Software Engineering
Early Quality Prediction: A Case Study in Telecommunications

IEEE Software
Emerald: Software Metrics and Models on the Desktop

IEEE Software
Induction of Decision Trees

Machine Learning
Using Classification Trees for Software Quality Models: Lessons Learned

HASE '98 The 3rd IEEE International Symposium on High-Assurance Systems Engineering
Application of a Usage Profile in Software Quality Models

CSMR '99 Proceedings of the Third European Conference on Software Maintenance and Reengineering
A tree-based classification model for analysis of a military software system

HASE '96 Proceedings of the 1996 High-Assurance Systems Engineering Workshop
Assessing Uncertain Predictions of Software Quality

METRICS '99 Proceedings of the 6th International Symposium on Software Metrics
Identification of Green, Yellow and Red Legacy Components

ICSM '98 Proceedings of the International Conference on Software Maintenance
Preparing Measurements of Legacy Software for Predicting Operational Faults

ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
Software Metrics Model For Integrating Quality Control And Prediction

ISSRE '97 Proceedings of the Eighth International Symposium on Software Reliability Engineering
Building Software Quality Classification Trees: Approach, Experimentation, Evaluation

ISSRE '97 Proceedings of the Eighth International Symposium on Software Reliability Engineering
Improving Tree-Based Models of Software Quality with Principal Components Analysis

ISSRE '00 Proceedings of the 11th International Symposium on Software Reliability Engineering
Application of multivariate analysis for software fault prediction

Software Quality Control
Biostatistical Analysis (5th Edition)

Biostatistical Analysis (5th Edition)

On the relation of refactorings and software defect prediction

Proceedings of the 2008 international working conference on Mining software repositories
EQ-mine: predicting short-term defects for software evolution

FASE'07 Proceedings of the 10th international conference on Fundamental approaches to software engineering
Defect proneness estimation and feedback approach for software design quality improvement

Information and Software Technology
Modeling software component criticality using a machine learning approach

AIS'04 Proceedings of the 13th international conference on AI, Simulation, and Planning in High Autonomy Systems
Reducing test effort: A systematic mapping study on existing approaches

Information and Software Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many development organizations try to minimize faults in software as a means for improving customer satisfaction. Assuring high software quality often entails time-consuming and costly development processes. A software quality model based on software metrics can be used to guide enhancement efforts by predicting which modules are fault-prone. This paper presents statistical techniques to determine which predictions by a classification tree should be considered uncertain. We conducted a case study of a large legacy telecommunications system. One release was the basis for the training dataset, and the subsequent release was the basis for the evaluation dataset. We built a classification tree using the TREEDISC algorithm, which is based on χ2 tests of contingency tables. The model predicted whether a module was likely to have faults discovered by customers, or not, based on software product, process, and execution metrics. We simulated practical use of the model by classifying the modules in the evaluation dataset. The model achieved useful accuracy, in spite of the very small proportion of fault-prone modules in the system. We assessed whether the classes assigned to the leaves were appropriate by statistical tests, and found sizable subsets of modules with uncertain classification. Discovering which modules have uncertain classifications allows sophisticated enhancement strategies to resolve uncertainties. Moreover, TREEDISC is especially well suited to identifying uncertain classifications.