Assessment of a New Three-Group Software Quality Classification Technique: An Empirical Case Study

Authors:
Taghi M. Khoshgoftaar;Naeem Seliya;Kehan Gao
Affiliations:
Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, USA 33431;Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, USA 33431;Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, USA 33431
Venue:
Empirical Software Engineering
Year:
2005

Citing 27
Cited 2

Software testing techniques (2nd ed.)

Software testing techniques (2nd ed.)
C4.5: programs for machine learning

C4.5: programs for machine learning
Developing Interpretable Models with Optimized set Reduction for Identifying High-Risk Software Components

IEEE Transactions on Software Engineering - Special issue on software reliability
Case-based reasoning

Case-based reasoning
A neural network approach for early detection of program modules having high risk in the maintenance phase

Selected papers of the sixth annual Oregon workshop on Software metrics
Experimental software engineering: a report on the state of the art

Proceedings of the 17th international conference on Software engineering
A Validation of Object-Oriented Design Metrics as Quality Indicators

IEEE Transactions on Software Engineering
Software metrics (2nd ed.): a rigorous and practical approach

Software metrics (2nd ed.): a rigorous and practical approach
Improved models of software quality

Improved models of software quality
Experimentation in software engineering: an introduction

Experimentation in software engineering: an introduction
Comparing Software Prediction Techniques Using Simulation

IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Case-Based Reasoning: Experiences, Lessons and Future Directions

Case-Based Reasoning: Experiences, Lessons and Future Directions
Machine Learning and Data Mining; Methods and Applications

Machine Learning and Data Mining; Methods and Applications
Accuracy of software quality models over multiple releases

Annals of Software Engineering
Software Metrics Data Analysis—Exploring the RelativePerformance of Some Commonly Used Modeling Techniques

Empirical Software Engineering
Balancing Misclassification Rates in Classification-TreeModels of Software Quality

Empirical Software Engineering
Empirically Guided Software Development Using Metric-Based Classification Trees

IEEE Software
Data Mining and Knowledge Discovery: Making Sense Out of Data

IEEE Expert: Intelligent Systems and Their Applications
Investigation of Logistic Regression as a Discriminant of Software Quality

METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
Experience from Replicating Empirical Studies on Prediction Models

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Identification of Green, Yellow and Red Legacy Components

ICSM '98 Proceedings of the International Conference on Software Maintenance
Software Metrics Model For Integrating Quality Control And Prediction

ISSRE '97 Proceedings of the Eighth International Symposium on Software Reliability Engineering
Evolutionary Neural Networks: A Robust Approach to Software Reliability Problems

ISSRE '97 Proceedings of the Eighth International Symposium on Software Reliability Engineering
Building Software Quality Classification Trees: Approach, Experimentation, Evaluation

ISSRE '97 Proceedings of the Eighth International Symposium on Software Reliability Engineering
Improving Usefulness of Software Quality Classification Models Based on Boolean Discriminant Functions

ISSRE '02 Proceedings of the 13th International Symposium on Software Reliability Engineering
Modeling software quality: the Software Measurement Analysis and Reliability Toolkit

ICTAI '00 Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence
Analogy-Based Practical Classification Rules for Software Quality Estimation

Empirical Software Engineering

Evaluating indirect and direct classification techniques for network intrusion detection

Intelligent Data Analysis
Review: Software fault prediction: A literature review and current trends

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The primary aim of risk-based software quality classification models is to detect, prior to testing or operations, components that are most-likely to be of high-risk. Their practical usage as quality assurance tools is gauged by the prediction-accuracy and cost-effective aspects of the models. Classifying modules into two risk groups is the more commonly practiced trend. Such models assume that all modules predicted as high-risk will be subjected to quality improvements. Due to the always-limited reliability improvement resources and the variability of the quality risk-factor, a more focused classification model may be desired to achieve cost-effective software quality assurance goals. In such cases, calibrating a three-group (high-risk, medium-risk, and low-risk) classification model is more rewarding. We present an innovative method that circumvents the complexities, computational overhead, and difficulties involved in calibrating pure or direct three-group classification models. With the application of the proposed method, practitioners can utilize an existing two-group classification algorithm thrice in order to yield the three risk-based classes. An empirical approach is taken to investigate the effectiveness and validity of the proposed technique. Some commonly used classification techniques are studied to demonstrate the proposed methodology. They include, the C4.5 decision tree algorithm, discriminant analysis, and case-based reasoning. For the first two, we compare the three-group model calibrated using the respective techniques with the one built by applying the proposed method. Any two-group classification technique can be employed by the proposed method, including those that do not provide a direct three-group classification model, e.x., logistic regression and certain binary classification trees, such as CART. Based on a case study of a large-scale industrial software system, it is observed that the proposed method yielded promising results. For a given classification technique, the expected cost of misclassification of the proposed three-group models were significantly better (generally) when compared to the technique驴s direct three-group model. In addition, the proposed method is also evaluated against an alternate indirect three-group classification method.