A fine-grained XML structural comparison approach

  • Authors:
  • Joe Tekli;Richard Chbeir;Kokou Yetongnon

  • Affiliations:
  • LE2I Laboratory, UMR, CNRS, University of Bourgogne, Dijon Cedex, France;LE2I Laboratory, UMR, CNRS, University of Bourgogne, Dijon Cedex, France;LE2I Laboratory, UMR, CNRS, University of Bourgogne, Dijon Cedex, France

  • Venue:
  • ER'07 Proceedings of the 26th international conference on Conceptual modeling
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

As the Web continues to grow and evolve, more and more information is being placed in structurally rich documents, XML documents in particular, so as to improve the efficiency of similarity clustering, information retrieval and data management applications. Various algorithms for comparing hierarchically structured data, e.g., XML documents, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being modeled as ordered labeled trees. Nevertheless, a thorough investigation of current approaches led us to identify several structural similarity aspects, i.e. sub-tree related similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an improved comparison method to deal with fine-grained sub-trees and leaf node repetitions, without increasing overall complexity with respect to current XML comparison methods. Our approach consists of two main algorithms for discovering the structural commonality between sub-trees and computing tree-based edit operations costs. A prototype has been developed to evaluate the optimality and performance of our method. Experimental results, on both real and synthetic XML data, demonstrate better performance with respect to alternative XML comparison methods.