Learning approximate MRFs from large transaction data

Authors:
Chao Wang;Srinivasan Parthasarathy
Affiliations:
Department of Computer Science and Engineering, The Ohio State University;Department of Computer Science and Engineering, The Ohio State University
Venue:
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Year:
2006

Citing 10
Cited 0

Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs

SIAM Journal on Computing
Statistical methods for speech recognition

Statistical methods for speech recognition
Multilevel k-way partitioning scheme for irregular graphs

Journal of Parallel and Distributed Computing
An Introduction to Variational Methods for Graphical Models

Machine Learning
Independence is good: dependency-based histogram synopses for high-dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Selectivity estimation using probabilistic models

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data

IEEE Transactions on Knowledge and Data Engineering
Tractable learning of large Bayes net structures from sparse data

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Empirical analysis of predictive algorithms for collaborative filtering

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we consider the problem of learning approximate Markov Random Fields (MRFs) from large transaction data. We rely on frequent itemsets to learn MRFs on the data. Since learning exact large MRFs is generally intractable, we resort to learning approximate MRFs. Our proposed modeling approach first employs graph partitioning to cluster variables into balanced disjoint partitions, and then augments important interactions across partitions to capture interdependencies across them. A novel treewidth based augmentation scheme is proposed to boost performance. We learn an exact local MRF for each partition and then combine all the local MRFs together to derive a global model of the data. A greedy approximate inference scheme is developed on this global model. We demonstrate the use of the learned MRFs on the selectivity estimation problem. Empirical evaluation on real datasets demonstrates the advantage of our approach over extant solutions.