XML documents clustering by structures

  • Authors:
  • Richi Nayak;Sumei Xu

  • Affiliations:
  • School of Information Systems, Queensland University of Technology, Brisbane, Australia;School of Information Systems, Queensland University of Technology, Brisbane, Australia

  • Venue:
  • INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

XCLS is a novel clustering algorithm to assemble heterogeneous XML documents by measuring their level similarity with a global criterion function. XCLS does not require the pair wise similarity to be computed between two individual documents, rather it measures the similarity at clustering level utilising the structural information of XML documents. Quality of the clustering solution depends on the calculation of the level similarity, and whether the level similarity can represent the documents’ structural similarity correctly. In this paper, we present the performance of XCLS for clustering the structural descriptions (ordered labeled trees) of XML documents. We have reported 5 sub-tasks corresponding to 5 corpuses as provided by the INEX 2005 document mining track.