A hierarchical consensus architecture for robust document clustering

  • Authors:
  • Xavier Sevillano;Germán Cobo;Francesc Alías;Joan Claudi Socoró

  • Affiliations:
  • Department of Communications and Signal Theory, Enginyeria i Arquitectura La Salle, Ramon Llull University, Barcelona, Spain;Department of Communications and Signal Theory, Enginyeria i Arquitectura La Salle, Ramon Llull University, Barcelona, Spain;Department of Communications and Signal Theory, Enginyeria i Arquitectura La Salle, Ramon Llull University, Barcelona, Spain;Department of Communications and Signal Theory, Enginyeria i Arquitectura La Salle, Ramon Llull University, Barcelona, Spain

  • Venue:
  • ECIR'07 Proceedings of the 29th European conference on IR research
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

A major problem encountered by text clustering practitioners is the difficulty of determining a priori which is the optimal text representation and clustering technique for a given clustering problem. As a step towards building robust document partitioning systems, we present a strategy based on a hierarchical consensus clustering architecture that operates on a wide diversity of document representations and partitions. The conducted experiments show that the proposed method is capable of yielding a consensus clustering that is comparable to the best individual clustering available even in the presence of a large number of poor individual labelings, outperforming classic nonhierarchical consensus approaches in terms of performance and computational cost.