Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Text clustering on latent thematic spaces: variants, strengths and weaknesses
ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation
Hi-index | 0.00 |
A major problem encountered by text clustering practitioners is the difficulty of determining a priori which is the optimal text representation and clustering technique for a given clustering problem. As a step towards building robust document partitioning systems, we present a strategy based on a hierarchical consensus clustering architecture that operates on a wide diversity of document representations and partitions. The conducted experiments show that the proposed method is capable of yielding a consensus clustering that is comparable to the best individual clustering available even in the presence of a large number of poor individual labelings, outperforming classic nonhierarchical consensus approaches in terms of performance and computational cost.