A vector space model for automatic indexing
Communications of the ACM
Information Theory, Inference & Learning Algorithms
Information Theory, Inference & Learning Algorithms
Overview of the INEX 2010 XML mining track: clustering and classification of XML documents
INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
Hi-index | 0.00 |
In this paper we propose two iterative clustering methods for grouping Wikipedia documents of a given huge collection into clusters. The recursive method clusters iteratively subsets of the complete collection. In each iteration, we select representative items for each group, which are then used for the next stage of clustering. The presented approaches are scalable algorithms which may be used with huge collections that in other way (for instance, using the classic clustering methods) would be computationally expensive of being clustered. The obtained results outperformed the random baseline presented in the INEX 2010 clustering task of the XML-Mining track.