Software defect prediction using semi-supervised learning with dimension reduction

Authors:
Huihua Lu;Bojan Cukic;Mark Culp
Affiliations:
West Virginia University, USA;West Virginia University, USA;West Virginia University, USA
Venue:
Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Year:
2012

Citing 6
Cited 1

Software Metrics Model For Quality Control

METRICS '97 Proceedings of the 4th International Symposium on Software Metrics
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Understanding the Yarowsky Algorithm

Computational Linguistics
Software quality estimation with limited fault data: a semi-supervised learning perspective

Software Quality Control
Sample-based software defect prediction with active and semi-supervised learning

Automated Software Engineering
Software Quality Analysis of Unlabeled Program Modules With Semisupervised Clustering

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Building a second opinion: learning cross-company data

Proceedings of the 9th International Conference on Predictive Models in Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Accurate detection of fault prone modules offers the path to high quality software products while minimizing non essential assurance expenditures. This type of quality modeling requires the availability of software modules with known fault content developed in similar environment. Establishing whether a module contains a fault or not can be expensive. The basic idea behind semi-supervised learning is to learn from a small number of software modules with known fault content and supplement model training with modules for which the fault information is not available. In this study, we investigate the performance of semi-supervised learning for software fault prediction. A preprocessing strategy, multidimensional scaling, is embedded in the approach to reduce the dimensional complexity of software metrics. Our results show that the semi-supervised learning algorithm with dimension-reduction preforms significantly better than one of the best performing supervised learning algorithms, random forest, in situations when few modules with known fault content are available for training.