Elements of information theory
Elements of information theory
Implementing data cubes efficiently
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Data mining and knowledge discovery in databases
Communications of the ACM
Fast discovery of association rules
Advances in knowledge discovery and data mining
Direct spatial search on pictorial databases using packed R-trees
SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
Inductive Learning Algorithms for Complex Systems Modeling
Inductive Learning Algorithms for Complex Systems Modeling
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Learning Logical Definitions from Relations
Machine Learning
Machine Learning
ECML '95 Proceedings of the 8th European Conference on Machine Learning
Efficient Locally Weighted Polynomial Regression Predictions
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
On the sample complexity of learning Bayesian networks
UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence
Accelerating exact k-means algorithms with geometric reasoning
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Bayesian networks for lossless dataset compression
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic query models for transaction data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
A Survey of Methods for Scaling Up Inductive Algorithms
Data Mining and Knowledge Discovery
Distributed learning with bagging-like performance
Pattern Recognition Letters
The Need for Low Bias Algorithms in Classification Learning from Large Data Sets
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Association Rule Mining on Remotely Sensed Images Using P-trees
PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Data mining tasks and methods: scalability
Handbook of data mining and knowledge discovery
Handbook of data mining and knowledge discovery
Efficient Multi-Object Dynamic Query Histograms
INFOVIS '99 Proceedings of the 1999 IEEE Symposium on Information Visualization
A hybrid approach to knowledge discovery from military health systems
Neural, Parallel & Scientific Computations - Special issue: Advances in intelligent systems and applications
Preserving confidentiality of high-dimensional tabulated data: Statistical and computational issues
Statistics and Computing
Learning with mixtures of trees
The Journal of Machine Learning Research
Learning evaluation functions to improve optimization by local search
The Journal of Machine Learning Research
Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data
IEEE Transactions on Knowledge and Data Engineering
An efficient data structure for decision rules discovery
Proceedings of the 2003 ACM symposium on Applied computing
Scalability and efficiency in multi-relational data mining
ACM SIGKDD Explorations Newsletter
Turning CARTwheels: an alternating algorithm for mining redescriptions
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Tractable learning of large Bayes net structures from sparse data
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Model Averaging for Prediction with Discrete Bayesian Networks
The Journal of Machine Learning Research
Knowledge discovery by probabilistic clustering of distributed databases
Data & Knowledge Engineering
Maxdiff kd-trees for data condensation
Pattern Recognition Letters
ICML '06 Proceedings of the 23rd international conference on Machine learning
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Partitioning strategies for distributed association rule mining
The Knowledge Engineering Review
MOTC: an interactive aid for multidimensional hypothesis generation
Journal of Management Information Systems - Special section: Data mining
The Journal of Machine Learning Research
Evaluating the potential of multithreaded platforms for irregular scientific computations
Proceedings of the 4th international conference on Computing frontiers
A grid-based approach for enterprise-scale data mining
Future Generation Computer Systems - Special section: Data mining in grid computing environments
A grid-based approach for enterprise-scale data mining
Future Generation Computer Systems - Special section: Data mining in grid computing environments
Detecting anomalous records in categorical datasets
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Anomaly pattern detection in categorical datasets
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning Classifiers from Large Databases Using Statistical Queries
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
European rain rate modulation enhanced by changes in the NAO and atmospheric circulation regimes
Computers & Geosciences
An Inductive Logic Programming Approach to Statistical Relational Learning
Proceedings of the 2005 conference on An Inductive Logic Programming Approach to Statistical Relational Learning
Adapting ADtrees for high arity features
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Dynamic probabilistic relational models
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Prefix-suffix trees: a novel scheme for compact representation of large datasets
PReMI'07 Proceedings of the 2nd international conference on Pattern recognition and machine intelligence
Fast Markov blanket discovery algorithm via local learning within single pass
Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Efficient learning and feature selection in high-dimensional regression
Neural Computation
Optimal constraint-based decision tree induction from itemset lattices
Data Mining and Knowledge Discovery
A fast algorithm for robust mixtures in the presence of measurement errors
IEEE Transactions on Neural Networks
Expert Systems with Applications: An International Journal
Anomaly detection in categorical datasets using bayesian networks
AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part II
Fast learning from sparse data
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Learning bayesian network structure from massive datasets: the «sparse candidate« algorithm
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Real-valued all-dimensions search: low-overhead rapid searching over subsets of attributes
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
The anchors hierarchy: using the triangle inequality to survive high dimensional data
UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Itemset support queries using frequent itemsets and their condensed representations
DS'06 Proceedings of the 9th international conference on Discovery Science
Bagging using statistical queries
ECML'06 Proceedings of the 17th European conference on Machine Learning
Data mining using relational database management systems
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
ALT'05 Proceedings of the 16th international conference on Algorithmic Learning Theory
Review: learning bayesian networks: Approaches and issues
The Knowledge Engineering Review
A fast calculation of metric scores for learning Bayesian network
International Journal of Automation and Computing
KDID'05 Proceedings of the 4th international conference on Knowledge Discovery in Inductive Databases
Learning optimal Bayesian networks using A* search
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Introducing graphical models to analyze genetic programming dynamics
Proceedings of the twelfth workshop on Foundations of genetic algorithms XII
Learning optimal bayesian networks: a shortest path perspective
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to certain assumptions, the costs of these operations can be shown to be independent of the number of records in the dataset and loglinear in the number of non-zero entries in the contingency table. We provide a very sparse data structure, the ADtree, to minimize memory use. We provide analytical worst-case bounds for this structure for several models of data distribution. We empirically demonstrate that tractably-sized data structures can be produced for large real-world datasets by (a) using a sparse tree structure that never allocates memory for counts of zero, (b) never allocating memory for counts that can be deduced from other counts, and (c) not bothering to expand the tree fully near its leaves. We show how the ADtree can be used to accelerate Bayes net structure finding algorithms, rule learning algorithms, and feature selection algorithms, and we provide a number of empirical results comparing ADtree methods against traditional direct counting approaches. We also discuss the possible uses of ADtrees in other machine learning methods, and discuss the merits of ADtrees in comparison with alternative representations such as kd-trees, R-trees and Frequent Sets.