Mining Static Code Metrics for a Robust Prediction of Software Defect-Proneness

Authors:
Lianfa Li;Hareton Leung
Affiliations:
-;-
Venue:
ESEM '11 Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement
Year:
2011

Citing 0
Cited 1

Data stream mining for predicting software build outcomes using source code metrics

Information and Software Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Defect-proneness prediction is affected by multiple aspects including sampling bias, non-metric factors, uncertainty of models etc. These aspects often contribute to prediction uncertainty and result in variance of prediction. This paper proposes two methods of data mining static code metrics to enhance defect-proneness prediction. Given little non-metric or qualitative information extracted from software codes, we first suggest to use a robust unsupervised learning method, shared nearest neighbors (SNN) to extract the similarity patterns of the code metrics. These patterns indicate similar characteristics of the components of the same cluster that may result in introduction of similar defects. Using the similarity patterns with code metrics as predictors, defect-proneness prediction may be improved. The second method uses the Occam's windows and Bayesian model averaging to deal with model uncertainty: first, the datasets are used to train and cross-validate multiple learners and then highly qualified models are selected and integrated into a robust prediction. From a study based on 12 datasets from NASA, we conclude that our proposed solutions can contribute to a better defect-proneness prediction.