Active sampling for detecting irrelevant features

Authors:
Sriharsha Veeramachaneni;Emanuele Olivetti;Paolo Avesani
Affiliations:
Istituto per la ricerca scientifica e tecnologica (ITC-IRST), Trento, Italy;Istituto per la ricerca scientifica e tecnologica (ITC-IRST), Trento, Italy;Istituto per la ricerca scientifica e tecnologica (ITC-IRST), Trento, Italy
Venue:
ICML '06 Proceedings of the 23rd international conference on Machine learning
Year:
2006

Citing 7
Cited 4

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Information-based objective functions for active data selection

Neural Computation
Support Vector Machine Active Learning with Application sto Text Classification

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On Active Learning for Data Acquisition

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Active learning with statistical models

Journal of Artificial Intelligence Research
Budgeted learning of nailve-bayes classifiers

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Active sampling for knowledge discovery from biomedical data

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Active Feature-Value Acquisition

Management Science
Active learning for directed exploration of complex systems

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Active Learning of Instance-Level Constraints for Semi-supervised Document Clustering

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
New algorithms for budgeted learning

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The general approach for automatically driving data collection using information from previously acquired data is called active learning. Traditional active learning addresses the problem of choosing the unlabeled examples for which the class labels are queried with the goal of learning a classifier. In contrast we address the problem of active feature sampling for detecting useless features. We propose a strategy to actively sample the values of new features on class-labeled examples, with the objective of feature relevance assessment. We derive an active feature sampling algorithm from an information theoretic and statistical formulation of the problem. We present experimental results on synthetic, UCI and real world datasets to demonstrate that our active sampling algorithm can provide accurate estimates of feature relevance with lower data acquisition costs than random sampling and other previously proposed sampling algorithms.