Active sampling for detecting irrelevant features

  • Authors:
  • Sriharsha Veeramachaneni;Emanuele Olivetti;Paolo Avesani

  • Affiliations:
  • Istituto per la ricerca scientifica e tecnologica (ITC-IRST), Trento, Italy;Istituto per la ricerca scientifica e tecnologica (ITC-IRST), Trento, Italy;Istituto per la ricerca scientifica e tecnologica (ITC-IRST), Trento, Italy

  • Venue:
  • ICML '06 Proceedings of the 23rd international conference on Machine learning
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The general approach for automatically driving data collection using information from previously acquired data is called active learning. Traditional active learning addresses the problem of choosing the unlabeled examples for which the class labels are queried with the goal of learning a classifier. In contrast we address the problem of active feature sampling for detecting useless features. We propose a strategy to actively sample the values of new features on class-labeled examples, with the objective of feature relevance assessment. We derive an active feature sampling algorithm from an information theoretic and statistical formulation of the problem. We present experimental results on synthetic, UCI and real world datasets to demonstrate that our active sampling algorithm can provide accurate estimates of feature relevance with lower data acquisition costs than random sampling and other previously proposed sampling algorithms.