Learning graphical models for hypothesis testing and classification

Authors:
Vincent Y. F. Tan;Sujay Sanghavi;John W. Fisher;Alan S. Willsky
Affiliations:
Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA;Electrical and Computer Engineering Department, University of Texas, Austin, TX;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA;Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA
Venue:
IEEE Transactions on Signal Processing
Year:
2010

Citing 19
Cited 0

Distance measures for signal processing and pattern recognition

Signal Processing
The nature of statistical learning theory

The nature of statistical learning theory
Bayesian Network Classifiers

Machine Learning - Special issue on learning with probabilistic representations
Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
Learning Markov networks: maximum bounded tree-width graphs

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
An introduction to variable and feature selection

The Journal of Machine Learning Research
Beyond independent components: trees and clusters

The Journal of Machine Learning Research
Introduction to Algorithms and Java CD-ROM

Introduction to Algorithms and Java CD-ROM
In Defense of One-Vs-All Classification

The Journal of Machine Learning Research
Learning Bayesian network classifiers by maximizing conditional likelihood

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Full Bayesian network classifiers

ICML '06 Proceedings of the 23rd international conference on Machine learning
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)

Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Learning Factor Graphs in Polynomial Time and Sample Complexity

The Journal of Machine Learning Research
Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting

The Journal of Machine Learning Research
Boosted Bayesian network classifiers

Machine Learning
Learning Graphical Models for Hypothesis Testing

SSP '07 Proceedings of the 2007 IEEE/SP 14th Workshop on Statistical Signal Processing
A brief introduction to boosting

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
A large-deviation analysis for the maximum likelihood learning of tree structures

ISIT'09 Proceedings of the 2009 IEEE international conference on Symposium on Information Theory - Volume 2
Learning Gaussian tree models: analysis of error exponents and extremal structures

IEEE Transactions on Signal Processing

Quantified Score

Hi-index	35.68

Visualization

Abstract

Sparse graphical models have proven to be a flexible class of multivariate probability models for approximating high-dimensional distributions. In this paper, we propose techniques to exploit this modeling ability for binary classification by discriminatively learning such models from labeled training data, i.e., using both positive and negative samples to optimize for the structures of the two models. We motivate why it is difficult to adapt existing generative methods, and propose an alternative method consisting of two parts. First, we develop a novel method to learn tree-structured graphical models which optimizes an approximation of the log-likelihood ratio. We also formulate a joint objective to learn a nested sequence of optimal forests-structured models. Second, we construct a classifier by using ideas from boosting to learn a set of discriminative trees. The final classifier can interpreted as a likelihood ratio test between two models with a larger set of pairwise features. We use cross-validation to determine the optimal number of edges in the final model. The algorithm presented in this paper also provides a method to identify a subset of the edges that are most salient for discrimination. Experiments show that the proposed procedure outperforms generative methods such as Tree Augmented Naïve Bayes and Chow-Liu as well as their boosted counterparts.