Toward economic machine learning and utility-based data mining

Authors:
Foster Provost
Affiliations:
New York University, New York, NY
Venue:
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Year:
2005

Citing 0
Cited 7

Machine learning paradigms for utility-based data mining

UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Data acquisition and cost-effective predictive modeling: targeting offers for electronic commerce

Proceedings of the ninth international conference on Electronic commerce
Get another label? improving data quality and data mining using multiple, noisy labelers

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Principal-agent learning

Decision Support Systems
Efficiently learning the accuracy of labeling sources for selective sampling

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Induction over Strategic Agents

Information Systems Research
Repeated labeling using multiple noisy labelers

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data mining requires certain information---for example, supervised learning requires training data. Some prior research has recognized that this information often does not simply present itself for free, but involves various acquisition costs. In addition, applying the learned models involves costs and benefits. I introduce a general economic setting that includes as special cases the settings of many different streams of prior research, such as cost-sensitive learning, traditional active learning, semi-supervised learning, active feature acquisition, progressive sampling, and budgeted learning, which are interwoven inextricably. For data mining in the general setting I suggest a strategy of maximum expected-utility data acquisition. Finally, I discuss how there are many open research issues that must be addressed. As a simple example, we must be able to deal with the seemingly straightforward problem of handling missing values in induction and inference.