An experimental comparison of model-based clustering methods
Machine Learning
A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Iterative Clustering of High Dimensional Text Data Augmented by Local Search
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
The Journal of Machine Learning Research
Latent Dirichlet Co-Clustering
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Comparing clusterings---an information based distance
Journal of Multivariate Analysis
The NVI clustering evaluation measure
CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
iLoc: a framework for incremental location-state acquisition and prediction based on mobile sensors
Proceedings of the 18th ACM conference on Information and knowledge management
Evaluating models of latent document semantics in the presence of OCR errors
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Representing document as dependency graph for document clustering
Proceedings of the 20th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Model-based algorithms are emerging as a preferred method for document clustering. As computing resources improve, methods such as Gibbs sampling have become more common for parameter estimation in these models. Gibbs sampling is well understood for many applications, but has not been extensively studied for use in document clustering. We explore the convergence rate, the possibility of label switching, and chain summarization methodologies for document clustering on a particular model, namely a mixture of multinomials model, and show that fairly simple methods can be employed, while still producing clusterings of superior quality compared to those produced with the EM algorithm.