Exploiting index pruning methods for clustering XML collections

  • Authors:
  • Ismail Sengor Altingovde;Duygu Atilgan;Özgür Ulusoy

  • Affiliations:
  • Department of Computer Engineering, Bilkent University, Ankara, Turkey;Department of Computer Engineering, Bilkent University, Ankara, Turkey;Department of Computer Engineering, Bilkent University, Ankara, Turkey

  • Venue:
  • INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we first employ the well known Cover-Coefficient Based Clustering Methodology (C3M) for clustering XML documents. Next, we apply index pruning techniques from the literature to reduce the size of the document vectors. Our experiments show that for certain cases, it is possible to prune up to 70% of the collection (or, more specifically, underlying document vectors) and still generate a clustering structure that yields the same quality with that of the original collection, in terms of a set of evaluation metrics.