Reconstruction of Markov Random Fields from Samples: Some Observations and Algorithms

Authors:
Guy Bresler;Elchanan Mossel;Allan Sly
Affiliations:
Dept. of Electrical Engineering and Computer Sciences, U.C. Berkeley,;Dept. of Statistics and Dept. of Electrical Engineering and Computer Sciences, U.C. Berkeley,;Dept. of Statistics, U.C. Berkeley,
Venue:
APPROX '08 / RANDOM '08 Proceedings of the 11th international workshop, APPROX 2008, and 12th international workshop, RANDOM 2008 on Approximation, Randomization and Combinatorial Optimization: Algorithms and Techniques
Year:
2008

Citing 4
Cited 3

A few logs suffice to build (almost) all trees (l): part I

Random Structures & Algorithms
Optimal phylogenetic reconstruction

Proceedings of the thirty-eighth annual ACM symposium on Theory of computing
Distorted Metrics on Trees and Phylogenetic Forests

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Learning Factor Graphs in Polynomial Time and Sample Complexity

The Journal of Machine Learning Research

The Complexity of Distinguishing Markov Random Fields

APPROX '08 / RANDOM '08 Proceedings of the 11th international workshop, APPROX 2008, and 12th international workshop, RANDOM 2008 on Approximation, Randomization and Combinatorial Optimization: Algorithms and Techniques
Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates

The Journal of Machine Learning Research
High-dimensional Gaussian graphical model selection: walk summability and local separation criterion

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Markov random fields are used to model high dimensional distributions in a number of applied areas. Much recent interest has been devoted to the reconstruction of the dependency structure from independent samples from the Markov random fields. We analyze a simple algorithm for reconstructing the underlying graph defining a Markov random field on nnodes and maximum degree dgiven observations. We show that under mild non-degeneracy conditions it reconstructs the generating graph with high probability using 茂戮驴(dlogn) samples which is optimal up to a multiplicative constant. Our results seem to be the first results for general models that guarantee that thegenerating model is reconstructed. Furthermore, we provide an explicit O(dnd+ 2logn) running time bound. In cases where the measure on the graph has correlation decay, the running time is O(n2logn) for all fixed d. In the full-length version we also discuss the effect of observing noisy samples. There we show that as long as the noise level is low, our algorithm is effective. On the other hand, we construct an example where large noise implies non-identifiability even for generic noise and interactions. Finally, we briefly show that in some cases, models with hidden nodes can also be recovered.