Learning from incomplete data with infinite imputations

  • Authors:
  • Uwe Dick;Peter Haider;Tobias Scheffer

  • Affiliations:
  • Max Planck Institute for Computer Science, Saarbrücken, Germany;Max Planck Institute for Computer Science, Saarbrücken, Germany;Max Planck Institute for Computer Science, Saarbrücken, Germany

  • Venue:
  • Proceedings of the 25th international conference on Machine learning
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problem of learning decision functions from training data in which some attribute values are unobserved. This problem can arise, for instance, when training data is aggregated from multiple sources, and some sources record only a subset of attributes. We derive a generic joint optimization problem in which the distribution governing the missing values is a free parameter. We show that the optimal solution concentrates the density mass on finitely many imputations, and provide a corresponding algorithm for learning from incomplete data. We report on empirical results on benchmark data, and on the email spam application that motivates our work.