Search-based learning of latent tree models

  • Authors:
  • Nevin L. Zhang;Tao Chen

  • Affiliations:
  • Hong Kong University of Science and Technology (Hong Kong);Hong Kong University of Science and Technology (Hong Kong)

  • Venue:
  • Search-based learning of latent tree models
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

A latent variable model is a statistical model that relates a set of observed variables (aka manifest variables) to a set of unobserved variables (aka latent variables). Examples of latent variable models include hidden Markov models (HMMs), latent class models, factor models, and so on. In this thesis we study a class of latent variable models known as latent tree (LT) models. LT models are tree-structured Bayesian networks where the leaf nodes represent manifest variables while internal nodes represent latent variables. We investigate the automatic induction of LT models from data, and the use of LT models in cluster analysis of categorical data. Several search-based algorithms for learning LT models have been developed. However there are important issues that remain poorly understood. In this thesis we study three such issues, namely operation granularity, efficient model evaluation and range of model adjustment. The investigation sheds new light on search-based learning of LT models and leads to a new algorithm that is conceptually simpler and more efficient than the state-of-the-art and yet finds better models. LT models can be used for latent structure discovery, density estimation and cluster analysis. In this thesis we address an issue that is critical to the application of LT models to cluster analysis, namely model interpretation, and we demonstrate using empirical results that LT analysis can discover interesting regularities from data that no other methods can.