Clustering XML documents based on structural similarity

  • Authors:
  • Guangming Xing;Zhonghang Xia;Jinhua Guo

  • Affiliations:
  • Department of Computer Science, Western Kentucky University, Bowling Green, KY;Department of Computer Science, Western Kentucky University, Bowling Green, KY;Computer and Information Science Department, University of Michigan - Dearborn, Dearborn, MI

  • Venue:
  • DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a framework for clustering XML documents based on structural similarity between XML documents. Firstly, the validity of using the edit distance between XML documents and schemata as the structural similarity is presented. Secondly, a novel solution is given for schema extraction. The solution is based on the minimum length description (MLD) principle, and allows tradeoff between the schema simplicity and precision based on the user's specification. Thirdly, clustering XML documents based on the edit distance is discussed. The efficacy and efficiency of our methodology have been tested using both real and synthesized data.