Identification of MCMC samples for clustering

  • Authors:
  • Kenichi Kurihara;Tsuyoshi Murata;Taisuke Sato

  • Affiliations:
  • Tokyo Institute of Technology, Tokyo, Japan;Tokyo Institute of Technology, Tokyo, Japan;Tokyo Institute of Technology, Tokyo, Japan

  • Venue:
  • LKR'08 Proceedings of the 3rd international conference on Large-scale knowledge resources: construction and application
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

For clustering problems, many studies use just MAP assignments to show clustering results instead of using whole samples from a MCMC sampler. This is because it is not straightforward to recognize clusters based on whole samples. Thus, we proposed an identification algorithm which constructs groups of relevant clusters. The identification exploits spectral clustering to group clusters. Although a naive spectral clustering algorithm is intractable due to memory space and computational time, we developed a memory-and-time efficient spectral clustering for samples of a MCMC sampler. In experiments, we show our algorithm is tractable for real data while the naive algorithm is intractable. For search query log data, we also show representative vocabularies of clusters, which cannot be chosen by just MAP assignments.