An iterative clustering method for the XML-mining task of the INEX 2010

  • Authors:
  • Mireya Tovar;Adrián Cruz;Blanca Vázquez;David Pinto;Darnes Vilariño;Azucena Montes

  • Affiliations:
  • Benemérita Universidad Autónoma de Puebla and Centro Nacional de Investigación y Desarrollo Tecnológico, México;Instituto Tecnológico de Cerro Azul, México and Centro Nacional de Investigación y Desarrollo Tecnológico, México;Instituto Tecnológico de Tuxtla Gutiérrez and Centro Nacional de Investigación y Desarrollo Tecnológico, México;Benemérita Universidad Autónoma de Puebla, México;Benemérita Universidad Autónoma de Puebla, México;Centro Nacional de Investigación y Desarrollo Tecnológico, México

  • Venue:
  • INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we propose two iterative clustering methods for grouping Wikipedia documents of a given huge collection into clusters. The recursive method clusters iteratively subsets of the complete collection. In each iteration, we select representative items for each group, which are then used for the next stage of clustering. The presented approaches are scalable algorithms which may be used with huge collections that in other way (for instance, using the classic clustering methods) would be computationally expensive of being clustered. The obtained results outperformed the random baseline presented in the INEX 2010 clustering task of the XML-Mining track.