An Expected Utility Approach to Active Feature-Value Acquisition

Authors:
Prem Melville;Maytal Saar-Tsechansky;Foster Provost;Raymond Mooney
Affiliations:
University of Texas at Austin;University of Texas at Austin;New York University;University of Texas at Austin
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 8
Cited 18

Statistical analysis with missing data

Statistical analysis with missing data
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
On Active Learning for Data Acquisition

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Active Feature-Value Acquisition for Classifier Induction

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Economical active feature-value acquisition through Expected Utility estimation

UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Budgeted learning of nailve-bayes classifiers

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Data acquisition and cost-effective predictive modeling: targeting offers for electronic commerce

Proceedings of the ninth international conference on Electronic commerce
Get another label? improving data quality and data mining using multiple, noisy labelers

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Customer targeting models using actively-selected web content

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Estimating the utility value of individual credit card delinquents

Expert Systems with Applications: An International Journal
Bellwether analysis: Searching for cost-effective query-defined predictors in large databases

ACM Transactions on Knowledge Discovery from Data (TKDD)
Active Feature-Value Acquisition

Management Science
Efficiently learning the accuracy of labeling sources for selective sampling

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Active dual supervision: reducing the cost of annotating examples and features

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Sample selection for statistical parsers: cognitively driven algorithms and evaluation measures

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Designing efficient cascaded classifiers: tradeoff between accuracy and cost

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
A unified approach to active dual supervision for labeling features and examples

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Selecting actions for resource-bounded information extraction using reinforcement learning

Proceedings of the fifth ACM international conference on Web search and data mining
A non-negative matrix factorization based approach for active dual supervision from document and word labels

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Resource-Bounded information extraction: acquiring missing feature values on demand

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Besting the quiz master: crowdsourcing incremental classification games

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
New algorithms for budgeted learning

Machine Learning
Intelligently querying incomplete instances for improving classification performance

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Repeated labeling using multiple noisy labelers

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many classification tasks training data have missing feature values that can be acquired at a cost. For building accurate predictive models, acquiring all missing values is often prohibitively expensive or unnecessary, while acquiring a random subset of feature values may not be most effective. The goal of active feature-value acquisition is to incrementally select feature values that are most cost-effective for improving the model's accuracy. We present an approach that acquires feature values for inducing a classification model based on an estimation of the expected improvement in model accuracy per unit cost. Experimental results demonstrate that our approach consistently reduces the cost of producing a model of a desired accuracy compared to random feature acquisitions.