Model-based document clustering with a collapsed gibbs sampler

  • Authors:
  • Daniel David Walker;Eric K. Ringger

  • Affiliations:
  • Brigham Young University, Provo, UT, USA;Brigham Young University, Probo, UT, USA

  • Venue:
  • Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Model-based algorithms are emerging as a preferred method for document clustering. As computing resources improve, methods such as Gibbs sampling have become more common for parameter estimation in these models. Gibbs sampling is well understood for many applications, but has not been extensively studied for use in document clustering. We explore the convergence rate, the possibility of label switching, and chain summarization methodologies for document clustering on a particular model, namely a mixture of multinomials model, and show that fairly simple methods can be employed, while still producing clusterings of superior quality compared to those produced with the EM algorithm.