Active Feature-Value Acquisition

Authors:
Maytal Saar-Tsechansky;Prem Melville;Foster Provost
Affiliations:
McCombs School of Business, University of Texas at Austin, Austin, Texas 78712;IBM T.J. Watson Research Center, Yorktown Heights, New York 10598;Stern School of Business, New York University, New York, New York 10012
Venue:
Management Science
Year:
2009

Citing 23
Cited 13

A model of decision-making with sequential information-acquisition (part 1)

Decision Support Systems
A model of decision-making with sequential information-acquisition (part 2)

Decision Support Systems
C4.5: programs for machine learning

C4.5: programs for machine learning
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Selective Sampling Using the Query by Committee Algorithm

Machine Learning
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Multiple Comparisons in Induction Algorithms

Machine Learning
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Data Mining and Knowledge Discovery
Sequential Decision Models for Expert System Optimization

IEEE Transactions on Knowledge and Data Engineering
Learning cost-sensitive active classifiers

Artificial Intelligence
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Crafting Papers on Machine Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Optimizing Expert Systems: Heuristics for Efficiently Generating Low-Cost Information Acquisition Strategies

INFORMS Journal on Computing
On Active Learning for Data Acquisition

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Active Sampling for Class Probability Estimation and Ranking

Machine Learning
Active Feature-Value Acquisition for Classifier Induction

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
An Expected Utility Approach to Active Feature-Value Acquisition

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Active sampling for detecting irrelevant features

ICML '06 Proceedings of the 23rd international conference on Machine learning
Selectively Acquiring Customer Information: A New Data Acquisition Problem and an Active Learning-Based Solution

Management Science
Selectively acquiring ratings for product recommendation

Proceedings of the ninth international conference on Electronic commerce
Handling Missing Values when Applying Classification Models

The Journal of Machine Learning Research
Design science in information systems research

MIS Quarterly
Budgeted learning of nailve-bayes classifiers

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Uncertainty sampling and transductive experimental design for active dual supervision

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Efficiently learning the accuracy of labeling sources for selective sampling

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Active dual supervision: reducing the cost of annotating examples and features

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
A unified approach to active dual supervision for labeling features and examples

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
A decision support system for cost-effective diagnosis

Artificial Intelligence in Medicine
Interactive learning for efficiently detecting errors in insurance claims

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Costs-sensitive classification in multistage classifier with fuzzy observations of object features

HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part II
Value of information lattice: exploiting probabilistic independence for effective feature subset acquisition

Journal of Artificial Intelligence Research
Comparison of cost for zero-one and stage-dependent fuzzy loss function

ACIIDS'12 Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part I
New algorithms for budgeted learning

Machine Learning
Intelligently querying incomplete instances for improving classification performance

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Repeated labeling using multiple noisy labelers

Data Mining and Knowledge Discovery
Collaborative information acquisition for data-driven decisions

Machine Learning

Quantified Score

Hi-index	0.01

Visualization

Abstract

Most induction algorithms for building predictive models take as input training data in the form of feature vectors. Acquiring the values of features may be costly, and simply acquiring all values may be wasteful or prohibitively expensive. Active feature-value acquisition (AFA) selects features incrementally in an attempt to improve the predictive model most cost-effectively. This paper presents a framework for AFA based on estimating information value. Although straightforward in principle, estimations and approximations must be made to apply the framework in practice. We present an acquisition policy, sampled expected utility (SEU), that employs particular estimations to enable effective ranking of potential acquisitions in settings where relatively little information is available about the underlying domain. We then present experimental results showing that, compared with the policy of using representative sampling for feature acquisition, SEU reduces the cost of producing a model of a desired accuracy and exhibits consistent performance across domains. We also extend the framework to a more general modeling setting in which feature values as well as class labels are missing and are costly to acquire.