A Novel Scalable and Data Efficient Feature Subset Selection Algorithm

  • Authors:
  • Sergio Rodrigues De Morais;Alex Aussem

  • Affiliations:
  • INSA-Lyon, LIESP, Villeurbanne, France F-69622;Université de Lyon 1, LIESP, Villeurbanne, France F-69622

  • Venue:
  • ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
  • Year:
  • 2008

Quantified Score

Hi-index 0.02

Visualization

Abstract

In this paper, we aim to identify the minimal subset of discrete random variables that is relevant for probabilistic classification in data sets with many variables but few instances. A principled solution to this problem is to determine the Markov boundaryof the class variable. Also, we present a novel scalable, data efficient and correct Markov boundary learning algorithm under the so-called faithfulnesscondition. We report extensive empiric experiments on synthetic and real data sets scaling up to 139,351 variables.