Evolutionary Optimization of Software Quality Modeling with Multiple Repositories

Authors:
Yi Liu;Taghi M. Khoshgoftaar;Naeem Seliya
Affiliations:
Georgia College & State University, Milledgeville, GA;Florida Atlantic University, Boca Raton, FL;University of Michigan-Dearborn, Dearborn, MI
Venue:
IEEE Transactions on Software Engineering
Year:
2010

Citing 0
Cited 7

Assessing the maintainability of software product line feature models using structural metrics

Software Quality Control
Customization support for CBR-based defect prediction

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
Transfer learning for cross-company software defect prediction

Information and Software Technology
Comparing the performance of fault prediction models which report multiple performance measures: recomputing the confusion matrix

Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Empirical evaluation of the effects of mixed project data on learning defect predictors

Information and Software Technology
Prediction of faults-slip-through in large software projects: an empirical evaluation

Software Quality Control
DConfusion: a technique to allow cross study performance evaluation of fault prediction studies

Automated Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

A novel search-based approach to software quality modeling with multiple software project repositories is presented. Training a software quality model with only one software measurement and defect data set may not effectively encapsulate quality trends of the development organization. The inclusion of additional software projects during the training process can provide a cross-project perspective on software quality modeling and prediction. The genetic-programming-based approach includes three strategies for modeling with multiple software projects: Baseline Classifier, Validation Classifier, and Validation-and-Voting Classifier. The latter is shown to provide better generalization and more robust software quality models. This is based on a case study of software metrics and defect data from seven real-world systems. A second case study considers 17 different (nonevolutionary) machine learners for modeling with multiple software data sets. Both case studies use a similar majority-voting approach for predicting fault-proneness class of program modules. It is shown that the total cost of misclassification of the search-based software quality models is consistently lower than those of the non-search-based models. This study provides clear guidance to practitioners interested in exploiting their organization's software measurement data repositories for improved software quality modeling.