Clustering XML documents using structural summaries

  • Authors:
  • Theodore Dalamagas;Tao Cheng;Klaas-Jan Winkel;Timos Sellis

  • Affiliations:
  • School of Electr and Comp Engineering, National Technical University of Athens, Zographou, Athens, Greece;Department of Computer Science, University of California, Santa Barbara, CA;Faculty of Computer Science, University of Twente, Enschede, The Netherlands;School of Electr and Comp Engineering, National Technical University of Athens, Zographou, Athens, Greece

  • Venue:
  • EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work presents a methodology for grouping structurally similar XML documents using clustering algorithms Modeling XML documents with tree-like structures, we face the ‘clustering XML documents by structure' problem as a ‘tree clustering' problem, exploiting distances that estimate the similarity between those trees in terms of the hierarchical relationships of their nodes We suggest the usage of tree structural summaries to improve the performance of the distance calculation and at the same time to maintain or even improve its quality Experimental results are provided using a prototype testbed.