XCLS: a fast and effective clustering algorithm for heterogenous XML documents

  • Authors:
  • Richi Nayak;Sumei Xu

  • Affiliations:
  • School of Information Systems, Queensland University of Technology, Brisbane, Australia;School of Information Systems, Queensland University of Technology, Brisbane, Australia

  • Venue:
  • PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a novel clustering algorithm to group the XML documents by similar structures. We introduce a Level structure format to represent the XML documents for efficient processing. We develop a global criterion function that do not require the pair-wise similarity to be computed between two individual documents, rather measures the similarity at clustering level utilising structural information of the XML documents. The experimental analysis shows the method to be fast and accurate.