Ensemble of missing data techniques to improve software prediction accuracy
Proceedings of the 28th international conference on Software engineering
A comprehensive empirical evaluation of missing value imputation in noisy software measurement data
Journal of Systems and Software
Imputation techniques for multivariate missingness in software measurement data
Software Quality Control
Ensemble missing data techniques for software effort prediction
Intelligent Data Analysis
Incomplete-case nearest neighbor imputation in software measurement data
Information Sciences: an International Journal
Hi-index | 0.00 |
One primary concern of software engineering is prediction accuracy. We use datasets to build and validate prediction systems of software development effort, for example. However it is not uncommon for datasets to contain missing values. When using machine learning techniques to build such prediction systems, handling of incomplete data is an important issue for classifier learning since missing values in either training or test set or in both sets can affect prediction accuracy. Many works in machine learning and statistics have shown that combining (ensemble) individual classifiers is an effective technique for improving accuracy of classification. The ensemble strategy is investigated in the context of incomplete data and software prediction. An ensemble Bayesian multiple imputation and nearest neighbour single imputation method, BAMINNSI, is proposed that constructs ensembles based on two imputation methods. Strong results on two benchmark industrial datasets using decision trees support the method.