Software quality analysis by combining multiple projects and learners

Authors:
Taghi M. Khoshgoftaar;Pierre Rebours;Naeem Seliya
Affiliations:
Computer Science and Engineering, Florida Atlantic University, Boca Raton, USA 33431;Computer Science and Engineering, Florida Atlantic University, Boca Raton, USA 33431;Computer and Information Science, University of Michigan-Dearborn, Dearborn, USA 48128
Venue:
Software Quality Control
Year:
2009

Citing 31
Cited 2

Knowledge in context: a strategy for expert system maintenance

AI '88 Proceedings of the second Australian joint conference on Artificial intelligence
The Detection of Fault-Prone Programs

IEEE Transactions on Software Engineering
Original Contribution: Stacked generalization

Neural Networks
C4.5: programs for machine learning

C4.5: programs for machine learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
Decision Combination in Multiple Classifier Systems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Induction of ripple-down rules applied to modeling large databases

Journal of Intelligent Information Systems
Bagging predictors

Machine Learning
Software metrics (2nd ed.): a rigorous and practical approach

Software metrics (2nd ed.): a rigorous and practical approach
Locally Weighted Learning

Artificial Intelligence Review - Special issue on lazy learning
Voting over Multiple Condensed Nearest Neighbors

Artificial Intelligence Review - Special issue on lazy learning
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Experimentation in software engineering: an introduction

Experimentation in software engineering: an introduction
Technical Note: Naive Bayes for Regression

Machine Learning
Comparing case-based reasoning classifiers for predicting high risk software components

Journal of Systems and Software
Comparing Software Prediction Techniques Using Simulation

IEEE Transactions on Software Engineering - Special section on the seventh international software metrics symposium
Balancing Misclassification Rates in Classification-TreeModels of Software Quality

Empirical Software Engineering
Neural Network Ensembles

IEEE Transactions on Pattern Analysis and Machine Intelligence
The Power of Decision Tables

ECML '95 Proceedings of the 8th European Conference on Machine Learning
Generating Accurate Rule Sets Without Global Optimization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
The Alternating Decision Tree Learning Algorithm

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Noise Elimination in Inductive Concept Learning: A Case Study in Medical Diagnosois

ALT '96 Proceedings of the 7th International Workshop on Algorithmic Learning Theory
Understanding the Nature of Software Evolution

ICSM '03 Proceedings of the International Conference on Software Maintenance
Automating the Analysis of Voting Systems

ISSRE '03 Proceedings of the 14th International Symposium on Software Reliability Engineering
The Necessity of Assuring Quality in Software Measurement Data

METRICS '04 Proceedings of the Software Metrics, 10th International Symposium
The Effects of Fault Counting Methods on Fault Model Quality

COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Volume 01
A Novel Method for Early Software Quality Prediction Based on Support Vector Machine

ISSRE '05 Proceedings of the 16th IEEE International Symposium on Software Reliability Engineering
Enhancing input value selection in parametric software cost estimation models through second level cost drivers

Software Quality Control
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Correlations between Internal Software Metrics and Software Dependability in a Large Population of Small C/C++ Programs

ISSRE '07 Proceedings of the The 18th IEEE International Symposium on Software Reliability
Lowering variance of decisions by using artificial neural network portfolios

Neural Computation

Software metrics reduction for fault-proneness prediction of software modules

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Assessing the maintainability of software product line feature models using structural metrics

Software Quality Control

Quantified Score

Hi-index	0.00

Visualization

Abstract

When building software quality models, the approach often consists of training data mining learners on a single fit dataset. Typically, this fit dataset contains software metrics collected during a past release of the software project that we want to predict the quality of. In order to improve the predictive accuracy of such quality models, it is common practice to combine the predictive results of multiple learners to take advantage of their respective biases. Although multi-learner classifiers have been proven to be successful in some cases, the improvement is not always significant because the information in the fit dataset sometimes can be insufficient. We present an innovative method to build software quality models using majority voting to combine the predictions of multiple learners induced on multiple training datasets. To our knowledge, no previous study in software quality has attempted to take advantage of multiple software project data repositories which are generally spread across the organization. In a large scale empirical study involving seven real-world datasets and seventeen learners, we show that, on average, combining the predictions of one learner trained on multiple datasets significantly improves the predictive performance compared to one learner induced on a single fit dataset. We also demonstrate empirically that combining multiple learners trained on a single training dataset does not significantly improve the average predictive accuracy compared to the use of a single learner induced on a single fit dataset.