Decision theoretic generalizations of the PAC model for neural net and other learning applications
Information and Computation
PALO: a probabilistic hill-climbing algorithm
Artificial Intelligence
Rigorous learning curve bounds from statistical mechanics
Machine Learning - Special issue on COLT '94
Explora: a multipattern and multistrategy discovery assistant
Advances in knowledge discovery and data mining
Fast discovery of association rules
Advances in knowledge discovery and data mining
Self bounding learning algorithms
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
An Algorithm for Multi-relational Discovery of Subgroups
PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Computable Shell Decomposition Bounds
COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Finding the most interesting patterns in a database quickly by using sequential sampling
The Journal of Machine Learning Research
Efficient online mining of large databases
International Journal of Business Information Systems
Hi-index | 0.00 |
Many data mining tasks can be seen as an instance of the problem of finding the most interesting (according to some utility function) patterns in a large database. In recent years, significant progress has been achieved in scaling algorithms for this task to very large databases through the use of sequential sampling techniques. However, except for sampling-based greedy algorithms which cannot give absolute quality guarantees, the scalability of existing approaches to this problem is only with respect to the data, not with respect to the size of the pattern space: it is universally assumed that the entire hypothesis space fits in main memory. In this paper, we describe how this class of algorithms can be extended to hypothesis spaces that do not fit in memory while maintaining the algorithms' precise 驴 - 驴 quality guarantees. We present a constant memory algorithm for this task and prove that it possesses the required properties. In an empirical comparison, we compare variable memory and constant memory sampling.