An iterative clustering method for the XML-mining task of the INEX 2010

Authors:
Mireya Tovar;Adrián Cruz;Blanca Vázquez;David Pinto;Darnes Vilariño;Azucena Montes
Affiliations:
Benemérita Universidad Autónoma de Puebla and Centro Nacional de Investigación y Desarrollo Tecnológico, México;Instituto Tecnológico de Cerro Azul, México and Centro Nacional de Investigación y Desarrollo Tecnológico, México;Instituto Tecnológico de Tuxtla Gutiérrez and Centro Nacional de Investigación y Desarrollo Tecnológico, México;Benemérita Universidad Autónoma de Puebla, México;Benemérita Universidad Autónoma de Puebla, México;Centro Nacional de Investigación y Desarrollo Tecnológico, México
Venue:
INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Year:
2010

Citing 2
Cited 1

A vector space model for automatic indexing

Communications of the ACM
Information Theory, Inference & Learning Algorithms

Information Theory, Inference & Learning Algorithms

Overview of the INEX 2010 XML mining track: clustering and classification of XML documents

INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose two iterative clustering methods for grouping Wikipedia documents of a given huge collection into clusters. The recursive method clusters iteratively subsets of the complete collection. In each iteration, we select representative items for each group, which are then used for the next stage of clustering. The presented approaches are scalable algorithms which may be used with huge collections that in other way (for instance, using the classic clustering methods) would be computationally expensive of being clustered. The obtained results outperformed the random baseline presented in the INEX 2010 clustering task of the XML-Mining track.