A hierarchical consensus architecture for robust document clustering

Authors:
Xavier Sevillano;Germán Cobo;Francesc Alías;Joan Claudi Socoró
Affiliations:
Department of Communications and Signal Theory, Enginyeria i Arquitectura La Salle, Ramon Llull University, Barcelona, Spain;Department of Communications and Signal Theory, Enginyeria i Arquitectura La Salle, Ramon Llull University, Barcelona, Spain;Department of Communications and Signal Theory, Enginyeria i Arquitectura La Salle, Ramon Llull University, Barcelona, Spain;Department of Communications and Signal Theory, Enginyeria i Arquitectura La Salle, Ramon Llull University, Barcelona, Spain
Venue:
ECIR'07 Proceedings of the 29th European conference on IR research
Year:
2007

Citing 2
Cited 1

Machine learning in automated text categorization

ACM Computing Surveys (CSUR)
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research

Text clustering on latent thematic spaces: variants, strengths and weaknesses

ICA'07 Proceedings of the 7th international conference on Independent component analysis and signal separation

Quantified Score

Hi-index	0.00

Visualization

Abstract

A major problem encountered by text clustering practitioners is the difficulty of determining a priori which is the optimal text representation and clustering technique for a given clustering problem. As a step towards building robust document partitioning systems, we present a strategy based on a hierarchical consensus clustering architecture that operates on a wide diversity of document representations and partitions. The conducted experiments show that the proposed method is capable of yielding a consensus clustering that is comparable to the best individual clustering available even in the presence of a large number of poor individual labelings, outperforming classic nonhierarchical consensus approaches in terms of performance and computational cost.