Ensemble Imputation Methods for Missing Software Engineering Data

  • Authors:
  • Bhekisipho Twala;Michelle Cartwright

  • Affiliations:
  • Brunel University;Brunel University

  • Venue:
  • METRICS '05 Proceedings of the 11th IEEE International Software Metrics Symposium
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

One primary concern of software engineering is prediction accuracy. We use datasets to build and validate prediction systems of software development effort, for example. However it is not uncommon for datasets to contain missing values. When using machine learning techniques to build such prediction systems, handling of incomplete data is an important issue for classifier learning since missing values in either training or test set or in both sets can affect prediction accuracy. Many works in machine learning and statistics have shown that combining (ensemble) individual classifiers is an effective technique for improving accuracy of classification. The ensemble strategy is investigated in the context of incomplete data and software prediction. An ensemble Bayesian multiple imputation and nearest neighbour single imputation method, BAMINNSI, is proposed that constructs ensembles based on two imputation methods. Strong results on two benchmark industrial datasets using decision trees support the method.