C4.5: programs for machine learning
C4.5: programs for machine learning
Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
A Comparative Analysis of Methods for Pruning Decision Trees
IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient progressive sampling
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust Classification for Imprecise Environments
Machine Learning
Information Retrieval
Learning cost-sensitive active classifiers
Artificial Intelligence
Instability of decision tree classification algorithms
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Active Sampling for Feature Selection
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management
Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management
KDD-Cup 2004: results and analysis
ACM SIGKDD Explorations Newsletter
Economical active feature-value acquisition through Expected Utility estimation
UBDM '05 Proceedings of the 1st international workshop on Utility-based data mining
Report on UBDM-05: Workshop on Utility-Based Data Mining
ACM SIGKDD Explorations Newsletter
UBDM 2006: Utility-Based Data Mining 2006 workshop report
ACM SIGKDD Explorations Newsletter
Effective short-term opponent exploitation in simplified poker
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Learning when training data are costly: the effect of class distribution on tree induction
Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Learning and classifying under hard budgets
ECML'05 Proceedings of the 16th European conference on Machine Learning
Guest editorial: special issue on utility-based data mining
Data Mining and Knowledge Discovery
Improving data mining utility with projective sampling
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Tuning metaheuristics: A data mining based approach for particle swarm optimization
Expert Systems with Applications: An International Journal
HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part I
Information Sciences: an International Journal
Hi-index | 0.00 |
Classification is a well-studied problem in data mining. Classification performance was originally gauged almost exclusively using predictive accuracy, but as work in the field progressed, more sophisticated measures of classifier utility that better represented the value of the induced knowledge were introduced. Nonetheless, most work still ignored the cost of acquiring training examples, even though this cost impacts the total utility of the data mining process. In this article we analyze the relationship between the number of acquired training examples and the utility of the data mining process and, given the necessary cost information, we determine the number of training examples that yields the optimum overall performance. We then extend this analysis to include the cost of model induction--measured in terms of the CPU time required to generate the model. While our cost model does not take into account all possible costs, our analysis provides some useful insights and a template for future analyses using more sophisticated cost models. Because our analysis is based on experiments that acquire the full set of training examples, it cannot directly be used to find a classifier with optimal or near-optimal total utility. To address this issue we introduce two progressive sampling strategies that are empirically shown to produce classifiers with near-optimal total utility.