Fast data acquisition in cost-sensitive learning

Authors:
Victor S. Sheng
Affiliations:
Computer Science Department, University of Central Arkansas, Conway, AR
Venue:
ICDM'11 Proceedings of the 11th international conference on Advances in data mining: applications and theoretical aspects
Year:
2011

Citing 12
Cited 0

C4.5: programs for machine learning

C4.5: programs for machine learning
MetaCost: a general method for making classifiers cost-sensitive

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Inducing Cost-Sensitive Trees via Instance Weighting

PKDD '98 Proceedings of the Second European Symposium on Principles of Data Mining and Knowledge Discovery
An iterative method for multi-class cost-sensitive learning

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Decision trees with minimal costs

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Active Feature-Value Acquisition for Classifier Induction

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Cost-Constrained Data Acquisition for Intelligent Data Preparation

IEEE Transactions on Knowledge and Data Engineering
On multi-class cost-sensitive learning

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Cost-sensitive classification: empirical evaluation of a hybrid genetic decision tree induction algorithm

Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Budgeted learning of nailve-bayes classifiers

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Learning and classifying under hard budgets

ECML'05 Proceedings of the 16th European conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data acquisition is the first and one of the most important steps in many data mining applications. It is a time consuming and costly task. Acquiring an insufficient number of examples makes the learned model and future prediction inaccurate, while acquiring more examples than necessary wastes time and money. Thus it is very important to estimate the number examples needed for learning algorithms in machine learning. However, most previous learning algorithms learn from a given and fixed set of examples. To our knowledge, little previous work in machine learning can dynamically acquire examples as it learns, and decide the ideal number of examples needed. In this paper, we propose a simple on-line framework for fast data acquisition (FDA). FDA is an extrapolation method that estimates the number of examples needed in each acquisition and acquire them simultaneously. Comparing to the naïve step-by-step data acquisition strategy, FDA reduces significantly the number of times of data acquisition and model building. This would significantly reduce the total cost of misclassification, data acquisition arrangement, computation, and examples acquired costs.