Statistical methods for speech recognition
Statistical methods for speech recognition
Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
An Introduction to Variational Methods for Graphical Models
Machine Learning
Independence is good: dependency-based histogram synopses for high-dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data
IEEE Transactions on Knowledge and Data Engineering
Tractable learning of large Bayes net structures from sparse data
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Empirical analysis of predictive algorithms for collaborative filtering
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Hi-index | 0.00 |
In this paper we consider the problem of learning approximate Markov Random Fields (MRFs) from large transaction data. We rely on frequent itemsets to learn MRFs on the data. Since learning exact large MRFs is generally intractable, we resort to learning approximate MRFs. Our proposed modeling approach first employs graph partitioning to cluster variables into balanced disjoint partitions, and then augments important interactions across partitions to capture interdependencies across them. A novel treewidth based augmentation scheme is proposed to boost performance. We learn an exact local MRF for each partition and then combine all the local MRFs together to derive a global model of the data. A greedy approximate inference scheme is developed on this global model. We demonstrate the use of the learned MRFs on the selectivity estimation problem. Empirical evaluation on real datasets demonstrates the advantage of our approach over extant solutions.