Latent Dirichlet Co-Clustering

  • Authors:
  • M. Mahdi Shafiei;Evangelos E. Milios

  • Affiliations:
  • Dalhousie University, Canada;Dalhousie University, Canada

  • Venue:
  • ICDM '06 Proceedings of the Sixth International Conference on Data Mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a generative model for simultaneously clustering documents and terms. Our model is a four-level hierarchical Bayesian model, in which each document is modeled as a random mixture of document topics , where each topic is a distribution over some segments of the text. Each of these segments in the document can be modeled as a mixture of word topics where each topic is a distribution over words. We present efficient approximate inference techniques based on Markov Chain Monte Carlo method and a Moment-Matching algorithm for empirical Bayes parameter estimation. We report results in document modeling, document and term clustering, comparing to other topic models, Clustering and Co-Clustering algorithms including Latent Dirichlet Allocation (LDA), Model-based Overlapping Clustering (MOC), Model-based Overlapping Co-Clustering (MOCC) and Information-Theoretic Co-Clustering (ITCC).