OHSUMED: an interactive retrieval evaluation and new large test collection for research
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
ACM Computing Surveys (CSUR)
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Restrictive clustering and metaclustering for self-organizing document collections
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Non-negative Matrix Factorization with Sparseness Constraints
The Journal of Machine Learning Research
Combining Multiple Clusterings Using Evidence Accumulation
IEEE Transactions on Pattern Analysis and Machine Intelligence
A hierarchical consensus architecture for robust document clustering
ECIR'07 Proceedings of the 29th European conference on IR research
Hi-index | 0.00 |
Deriving a thematically meaningful partition of an unlabeled text corpus is a challenging task. In comparison to classic term-based document indexing, the use of document representations based on latent thematic generative models can lead to improved clustering. However, determining a priori the optimal indexing technique is not straightforward, as it depends on the clustering problem faced and the partitioning strategy adopted. So as to overcome this indeterminacy, we propose deriving a consensus labeling upon the results of clustering processes executed on several document representations. Experiments conducted on subsets of two standard text corpora evaluate distinct clustering strategies based on latent thematic spaces and highlight the usefulness of consensus clustering to overcome the optimal document indexing indeterminacy.