Balancing Misclassification Rates in Classification-TreeModels of Software Quality

Authors:
Taghi M. Khoshgoftaar;Xiaojing Yuan;Edward B. Allen
Affiliations:
Florida Atlantic University, Boca Raton, Florida USA;Florida Atlantic University, Boca Raton, Florida USA;Mississippi State University, Mississippi USA
Venue:
Empirical Software Engineering
Year:
2000

Citing 27
Cited 11

Evaluating Software Complexity Measures

IEEE Transactions on Software Engineering
Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis

IEEE Transactions on Software Engineering - Special Issue on Artificial Intelligence in Software Applications
The Detection of Fault-Prone Programs

IEEE Transactions on Software Engineering
Developing Interpretable Models with Optimized set Reduction for Identifying High-Risk Software Components

IEEE Transactions on Software Engineering - Special issue on software reliability
A neural network approach for early detection of program modules having high risk in the maintenance phase

Selected papers of the sixth annual Oregon workshop on Software metrics
System acquisition based on software product assessment

Proceedings of the 18th international conference on Software engineering
A Validation of Object-Oriented Design Metrics as Quality Indicators

IEEE Transactions on Software Engineering
Comments on "Towards a Framework for Software Measurement Validation"

IEEE Transactions on Software Engineering
Reply to: Comments on "Towards a Framework for Software Measurement Validation"

IEEE Transactions on Software Engineering
Experiences with criticality predictions in software development

ESEC '97/FSE-5 Proceedings of the 6th European SOFTWARE ENGINEERING conference held jointly with the 5th ACM SIGSOFT international symposium on Foundations of software engineering
A Procedure for Analyzing Unbalanced Datasets

IEEE Transactions on Software Engineering
Which software modules have faults which will be discovered by customers?

Journal of Software Maintenance: Research and Practice
Classification of Fault-Prone Software Modules: Prior Probabilities,Costs, and Model Evaluation

Empirical Software Engineering
Using Process History to Predict Software Quality

Computer
Early Quality Prediction: A Case Study in Telecommunications

IEEE Software
Why Software Reliability Predictions Fail

IEEE Software
Emerald: Software Metrics and Models on the Desktop

IEEE Software
Towards a Framework for Software Measurement Validation

IEEE Transactions on Software Engineering
Data Mining and Knowledge Discovery: Making Sense Out of Data

IEEE Expert: Intelligent Systems and Their Applications
Induction of Decision Trees

Machine Learning
Using Classification Trees for Software Quality Models: Lessons Learned

HASE '98 The 3rd IEEE International Symposium on High-Assurance Systems Engineering
Application of a Usage Profile in Software Quality Models

CSMR '99 Proceedings of the Third European Conference on Software Maintenance and Reengineering
A tree-based classification model for analysis of a military software system

HASE '96 Proceedings of the 1996 High-Assurance Systems Engineering Workshop
Preparing Measurements of Legacy Software for Predicting Operational Faults

ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
Software Quality Maintenance Model

ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
Building Software Quality Classification Trees: Approach, Experimentation, Evaluation

ISSRE '97 Proceedings of the Eighth International Symposium on Software Reliability Engineering
Biostatistical Analysis (5th Edition)

Biostatistical Analysis (5th Edition)

Controlling Overfitting in Classification-Tree Models ofSoftware Quality

Empirical Software Engineering
Uncertain Classification of Fault-Prone Software Modules

Empirical Software Engineering
Analogy-Based Practical Classification Rules for Software Quality Estimation

Empirical Software Engineering
Assessment of a New Three-Group Software Quality Classification Technique: An Empirical Case Study

Empirical Software Engineering
Enhancing software quality estimation using ensemble-classifier based noise filtering

Intelligent Data Analysis
Evaluating noise elimination techniques for software quality estimation

Intelligent Data Analysis
Software quality estimation with limited fault data: a semi-supervised learning perspective

Software Quality Control
Software quality analysis by combining multiple projects and learners

Software Quality Control
Improving software-quality predictions with data sampling and boosting

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
Detecting outliers using rule-based modeling for improving CBR-based software quality classification models

ICCBR'03 Proceedings of the 5th international conference on Case-based reasoning: Research and Development
An in-depth study of the potentially confounding effect of class size in fault prediction

ACM Transactions on Software Engineering and Methodology (TOSEM)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software product and process metrics can be useful predictorsof which modules are likely to have faults during operations.Developers and managers can use such predictions by softwarequality models to focus enhancement efforts before release.However, in practice, software quality modeling methods in theliterature may not produce a useful balance between the two kindsof misclassification rates, especially when there are few faultymodules.This paper presents a practical classificationrule in the context of classification tree models that allowsappropriate emphasis on each type of misclassification accordingto the needs of the project. This is especially important whenthe faulty modules are rare.An industrial case study using classification trees, illustrates the tradeoffs.The trees were built using the TREEDISC algorithm whichis a refinement of the CHAID algorithm. We examinedtwo releases of a very large telecommunications system, and builtmodels suited to two points in the development life cycle: theend of coding and the end of beta testing. Both trees had onlyfive significant predictors, out of 28 and 42 candidates, respectively.We interpreted the structure of the classification trees, andwe found the models had useful accuracy.