Ensemble of missing data techniques to improve software prediction accuracy

  • Authors:
  • Bhekisipho Twala;Michelle Cartwright;Martin Shepperd

  • Affiliations:
  • Brunel University, United Kingdom;Brunel University, United Kingdom;Brunel University, United Kingdom

  • Venue:
  • Proceedings of the 28th international conference on Software engineering
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Software engineers are commonly faced with the problem of incomplete data. Incomplete data can reduce system performance in terms of predictive accuracy. Unfortunately, rare research has been conducted to systematically explore the impact of missing values, especially from the missing data handling point of view. This has made various missing data techniques (MDTs) less significant. This paper describes a systematic comparison of seven MDTs using eight industrial datasets. Our findings from an empirical evaluation suggest listwise deletion as the least effective technique for handling incomplete data while multiple imputation achieves the highest accuracy rates. We further propose and show how a combination of MDTs by randomizing a decision tree building algorithm leads to a significant improvement in prediction performance for missing values up to 50%.