An extension of PLSA for document clustering

Authors:
Young-Min Kim;Jean-François Pessiot;Massih Reza Amini;Patrick Gallinari
Affiliations:
Université Pierre et Marie Curie, Paris, France;Université Pierre et Marie Curie, Paris, France;Université Pierre et Marie Curie, Paris, France;Université Pierre et Marie Curie, Paris, France
Venue:
Proceedings of the 17th ACM conference on Information and knowledge management
Year:
2008

Citing 6
Cited 2

Scatter/Gather: a cluster-based approach to browsing large document collections

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
The use of unlabeled data to improve supervised learning for text summarization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised document classification using sequential information maximization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
A hierarchical monothetic document clustering algorithm for summarization and browsing search results

Proceedings of the 13th international conference on World Wide Web

Topic-Based Hard Clustering of Documents Using Generative Models

ISMIS '09 Proceedings of the 18th International Symposium on Foundations of Intelligent Systems
A statistical model for topically segmented documents

DS'11 Proceedings of the 14th international conference on Discovery science

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose an extension of the PLSA model in which an extra latent variable allows the model to co-cluster documents and terms simultaneously. We show on three datasets that our extended model produces statistically significant improvements with respect to two clustering measures over the original PLSA and the multinomial mixture MM models.