Progressive rademacher sampling

Authors:
Tapio Elomaa;Matti Kääriäinen
Affiliations:
Department of Computer Science, P.O. Box 26(Teollisuuskatu 23), FIN-00014 Univ. of Helsinki, Finland;Department of Computer Science, P.O. Box 26(Teollisuuskatu 23), FIN-00014 Univ. of Helsinki, Finland
Venue:
Eighteenth national conference on Artificial intelligence
Year:
2002

Citing 11
Cited 2

C4.5: programs for machine learning

C4.5: programs for machine learning
The power of sampling in knowledge discovery

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The nature of statistical learning theory

The nature of statistical learning theory
Rigorous learning curve bounds from statistical mechanics

Machine Learning - Special issue on COLT '94
Efficient progressive sampling

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A sequential sampling algorithm for a general class of utility criteria

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical learning control of uncertain systems: theory and algorithms

Applied Mathematics and Computation
The Effects of Training Set Size on Decision Tree Complexity

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Integrative Windowing

Journal of Artificial Intelligence Research
Practical PAC learning

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Selective Rademacher Penalization and Reduced Error Pruning of Decision Trees

The Journal of Machine Learning Research
A dynamic adaptive sampling algorithm (DASA) for real world applications: finger print recognition and face recognition

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sampling can enhance processing of large training example databases, but without knowing all of the data, or the example producing process, it is impossible to know in advance what size of a sample to choose in order to guarantee good performance. Progressive sampling has been suggested to circumvent this problem. The idea in it is to increase the sample size according to some schedule until accuracy close to that which would be obtained using all of the data is reached. How to determine this stopping time efficiently and accurately is a central difficulty in progressive sampling.We study stopping time determination by approximating the generalization error of the hypothesis rather than by assuming the often observed shape for the learning curve and trying to detect whether the final plateau has been reached in the curve. We use data dependent generalization error bounds. Instead of using the common cross validation approach, we use the recently introduced Rademacher penalties, which have been observed to give good results on simple concept classes.We experiment with two-level decision trees built by the learning algorithm T2. It finds a hypothesis with the minimal error with respect to the sample. The theoretically well motivated stopping time determination based on Rademacher penalties gives results that are much closer to those attained using heuristics based on assumptions on learning curve shape than distribution independent estimates based on VC dimension do.