Efficient sampling strategies for relational database operations
ICDT Selected papers of the 4th international conference on Database theory
The power of sampling in knowledge discovery
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Query size estimation by adaptive sampling
Selected papers of the 9th annual ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Efficient progressive sampling
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Analysis and application of adaptive sampling
PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A sequential sampling algorithm for a general class of utility criteria
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
An Optimal Algorithm for Monte Carlo Estimation
SIAM Journal on Computing
A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Scaling Up a Boosting-Based Learner via Adaptive Sampling
PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
Practical Algorithms for On-line Sampling
DS '98 Proceedings of the First International Conference on Discovery Science
Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms
DS '99 Proceedings of the Second International Conference on Discovery Science
Adaptive-sampling algorithms for answering aggregation queries on Web sites
Data & Knowledge Engineering
Hi-index | 0.00 |
Sequential sampling algorithms have recently attracted interest as a way to design scalable algorithms for Data mining and KDD processes. In this paper, we identify an elementary sequential samplingtask (estimation from examples), from which one can derive many other tasks appearing in practice. We present a generic algorithm to solve this task and an analysis of its correctness and running time that is simpler and more intuitive than those existing in the literature. For two specific tasks, frequency and advantage estimation, we derive lower bounds on running time in addition to the general upper bounds.