Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
PALO: a probabilistic hill-climbing algorithm
Artificial Intelligence
Efficient progressive sampling
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining high-speed data streams
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
KDD-Cup 2000 organizers' report: peeling the onion
ACM SIGKDD Explorations Newsletter - Special issue on “Scalable data mining algorithms”
Mining time-changing data streams
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms
Data Mining and Knowledge Discovery
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
The learning-curve sampling method applied to model-based clustering
The Journal of Machine Learning Research
Learning bayesian network structure from massive datasets: the «sparse candidate« algorithm
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
On the sample complexity of learning Bayesian networks
UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
Discovering decision rules from numerical data streams
Proceedings of the 2004 ACM symposium on Applied computing
Tractable learning of large Bayes net structures from sparse data
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Fast discovery of unexpected patterns in data, relative to a Bayesian network
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Learning the structure of Markov logic networks
ICML '05 Proceedings of the 22nd international conference on Machine learning
Machine Learning
Graphical models of residue coupling in protein families
Proceedings of the 5th international workshop on Bioinformatics
Bayes net graphs to understand co-authorship networks?
Proceedings of the 3rd international workshop on Link discovery
ICML '06 Proceedings of the 23rd international conference on Machine learning
Info-fuzzy algorithms for mining dynamic data streams
Applied Soft Computing
Mining Arbitrarily Large Datasets Using Heuristic k-Nearest Neighbour Search
AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Indexing density models for incremental learning and anytime classification on data streams
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
CBDT: A Concept Based Approach to Data Stream Mining
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Harnessing the strengths of anytime algorithms for constant data streams
Data Mining and Knowledge Discovery
Streaming data reduction using low-memory factored representations
Information Sciences: an International Journal
Voting massive collections of bayesian network classifiers for data streams
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
A few useful things to know about machine learning
Communications of the ACM
Monte Carlo MCMC: efficient inference by approximate sampling
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.02 |
In this paper we propose a scaling-up method that is applicable to essentially any induction algorithm based on discrete search. The result of applying the method to an algorithm is that its running time becomes independent of the size of the database, while the decisions made are essentially identical to those that would be made given infinite data. The method works within pre-specified memory limits and, as long as the data is iid, only requires accessing it sequentially. It gives anytime results, and can be used to produce batch, stream, time-changing and active-learning versions of an algorithm. We apply the method to learning Bayesian networks, developing an algorithm that is faster than previous ones by orders of magnitude, while achieving essentially the same predictive performance. We observe these gains on a series of large databases "generated from benchmark networks, on the KDD Cup 2000 e-commerce data, and on a Web log containing 100 million requests.