Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study

Authors:
Taghi M. Khoshgoftaar;Naeem Seliya
Affiliations:
Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL 33431 USA taghi@cse.fau.edu;Empirical Software Engineering Laboratory, Department of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL 33431 USA
Venue:
Empirical Software Engineering
Year:
2004

Citing 18
Cited 33

Applied multivariate statistical analysis

Applied multivariate statistical analysis
Developing Interpretable Models with Optimized set Reduction for Identifying High-Risk Software Components

IEEE Transactions on Software Engineering - Special issue on software reliability
Case-based reasoning

Case-based reasoning
Experimental software engineering: a report on the state of the art

Proceedings of the 17th international conference on Software engineering
A Validation of Object-Oriented Design Metrics as Quality Indicators

IEEE Transactions on Software Engineering
Globally Optimal Fuzzy Decision Trees for Classification and Regression

IEEE Transactions on Pattern Analysis and Machine Intelligence
Comparing Software Prediction Techniques Using Simulation

IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Case-Based Reasoning: Experiences, Lessons and Future Directions

Case-Based Reasoning: Experiences, Lessons and Future Directions
Controlling Overfitting in Classification-Tree Models ofSoftware Quality

Empirical Software Engineering
Data Mining and Knowledge Discovery: Making Sense Out of Data

IEEE Expert: Intelligent Systems and Their Applications
Assessing the applicability of fault-proneness models across object-oriented software projects

IEEE Transactions on Software Engineering
Predicting Fault-Proneness using OO Metrics: An Industrial Case Study

CSMR '02 Proceedings of the 6th European Conference on Software Maintenance and Reengineering
Investigation of Logistic Regression as a Discriminant of Software Quality

METRICS '01 Proceedings of the 7th International Symposium on Software Metrics
Experience from Replicating Empirical Studies on Prediction Models

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Software Quality Classification Modeling Using The SPRINT Decision Tree Algorithm

ICTAI '02 Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence
Building Software Quality Classification Trees: Approach, Experimentation, Evaluation

ISSRE '97 Proceedings of the Eighth International Symposium on Software Reliability Engineering
Application of multivariate analysis for software fault prediction

Software Quality Control
Application of neural networks to software quality modeling of a very large telecommunications system

IEEE Transactions on Neural Networks

Resource-oriented software quality classification models

Journal of Systems and Software
Determining noisy instances relative to attributes of interest

Intelligent Data Analysis
Evaluating indirect and direct classification techniques for network intrusion detection

Intelligent Data Analysis
Detecting noisy instances with the rule-based classification model

Intelligent Data Analysis
Spam Filter Based Approach for Finding Fault-Prone Software Modules

MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Training on errors experiment to detect fault-prone software modules by spam filter

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
An extension of fault-prone filtering using precise training and a dynamic threshold

Proceedings of the 2008 international working conference on Mining software repositories
Quantitative analysis of faults and failures with multiple releases of softpm

Proceedings of the Second ACM-IEEE international symposium on Empirical software engineering and measurement
Accuracy and efficiency comparisons of single- and multi-cycled software classification models

Information and Software Technology
Imputation techniques for multivariate missingness in software measurement data

Software Quality Control
Prediction of Fault-Prone Software Modules Using a Generic Text Discriminator

IEICE - Transactions on Information and Systems
An early software-quality classification based on improved grey relational classifier

Expert Systems with Applications: An International Journal
On the relative value of cross-company and within-company data for defect prediction

Empirical Software Engineering
Knowledge discovery from imbalanced and noisy data

Data & Knowledge Engineering
Evolutionary sampling and software quality modeling of high-assurance systems

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans
A systematic and comprehensive investigation of methods to build and evaluate fault prediction models

Journal of Systems and Software
Aggregating performance metrics for classifier evaluation

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
An empirical investigation of filter attribute selection techniques for software quality classification

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Fault-prone module detection using large-scale text features based on spam filtering

Empirical Software Engineering
Cost-sensitive boosting neural networks for software defect prediction

Expert Systems with Applications: An International Journal
Ensemble missing data techniques for software effort prediction

Intelligent Data Analysis
An integrated approach to detect fault-prone modules using complexity and text feature metrics

AST/UCMA/ISA/ACN'10 Proceedings of the 2010 international conference on Advances in computer science and information technology
Organizational volatility and its effects on software defects

Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Combining techniques for software quality classification: An integrated decision network approach

Expert Systems with Applications: An International Journal
Review: Software fault prediction: A literature review and current trends

Expert Systems with Applications: An International Journal
Predicting high-risk program modules by selecting the right software measurements

Software Quality Control
Modeling software component criticality using a machine learning approach

AIS'04 Proceedings of the 13th international conference on AI, Simulation, and Planning in High Autonomy Systems
Comparing the performance of fault prediction models which report multiple performance measures: recomputing the confusion matrix

Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Empirical study of Software Quality estimation

Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
A study of subgroup discovery approaches for defect prediction

Information and Software Technology
Incomplete-case nearest neighbor imputation in software measurement data

Information Sciences: an International Journal
An empirical study of the classification performance of learners on imbalanced and noisy software quality data

Information Sciences: an International Journal
DConfusion: a technique to allow cross study performance evaluation of fault prediction studies

Automated Software Engineering

Quantified Score

Hi-index	0.01

Visualization

Abstract

Software metrics-based quality classification models predict a software module as either fault-prone (fp) or not fault-prone (nfp). Timely application of such models can assist in directing quality improvement efforts to modules that are likely to be fp during operations, thereby cost-effectively utilizing the software quality testing and enhancement resources. Since several classification techniques are available, a relative comparative study of some commonly used classification techniques can be useful to practitioners. We present a comprehensive evaluation of the relative performances of seven classification techniques and/or tools. These include logistic regression, case-based reasoning, classification and regression trees (CART), tree-based classification with S-PLUS, and the Sprint-Sliq, C4.5, and Treedisc algorithms. The use of expected cost of misclassification (ECM), is introduced as a singular unified measure to compare the performances of different software quality classification models. A function of the costs of the Type I (a nfp module misclassified as fp) and Type II (a fp module misclassified as nfp) misclassifications, ECM is computed for different cost ratios. Evaluating software quality classification models in the presence of varying cost ratios is important, because the usefulness of a model is dependent on the system-specific costs of misclassifications. Moreover, models should be compared and preferred for cost ratios that fall within the range of interest for the given system and project domain. Software metrics were collected from four successive releases of a large legacy telecommunications system. A two-way ANOVA randomized-complete block design modeling approach is used, in which the system release is treated as a block, while the modeling method is treated as a factor. It is observed that predictive performances of the models is significantly different across the system releases, implying that in the software engineering domain prediction models are influenced by the characteristics of the data and the system being modeled. Multiple-pairwise comparisons are performed to evaluate the relative performances of the seven models for the cost ratios of interest to the case study. In addition, the performance of the seven classification techniques is also compared with a classification based on lines of code. The comparative approach presented in this paper can also be applied to other software systems.