A methodology for clustering XML documents based on labeled tree

  • Authors:
  • Lei Liu;Yongqing Zheng;Baoshi Ding;Haiyan Liu

  • Affiliations:
  • School of Computer Science and Technology, Shandong University, Jinan, China;School of Computer Science and Technology, Shandong University, Jinan, China;School of Computer Science and Technology, Shandong University, Jinan, China;Software College, Hebei Normal University, Shijiazhuang, China

  • Venue:
  • FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The amount of XML documents is increasing rapidly. In order to analyze the information represented in XML documents efficiently, researches on XML document clustering are actively in progress. The key issue is how to devise the similarity measure between XML documents to be used for clustering. Since XML documents have hierarchical structure, it is not appropriate to cluster them by using a general document similarity measure. In this paper, we propose the novel similarity calculation measure by reducing Nesting and repeating in the whole XML document. Then propose an improved Edge-set comparison algorithm to calculate two XML documents' similarity. Our experiments show that the proposed method improves accuracy on the clustering, compared to the previous works.