A fine-grained XML structural comparison approach

Authors:
Joe Tekli;Richard Chbeir;Kokou Yetongnon
Affiliations:
LE2I Laboratory, UMR, CNRS, University of Bourgogne, Dijon Cedex, France;LE2I Laboratory, UMR, CNRS, University of Bourgogne, Dijon Cedex, France;LE2I Laboratory, UMR, CNRS, University of Bourgogne, Dijon Cedex, France
Venue:
ER'07 Proceedings of the 26th international conference on Conceptual modeling
Year:
2007

Citing 13
Cited 4

Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
Change detection in hierarchically structured information

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The String-to-String Correction Problem

Journal of the ACM (JACM)
Bounds on the Complexity of the Longest Common Subsequence Problem

Journal of the ACM (JACM)
Bounds for the String Editing Problem

Journal of the ACM (JACM)
Information Retrieval

Information Retrieval
Approximate XML joins

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Comparing Hierarchical Data in External Memory

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Detecting Changes in XML Documents

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Clustering Algorithms and Validity Measures

SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management
A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications

Information Systems - Special issue on web data integration
A methodology for clustering XML documents by structure

Information Systems
Approximate subtree identification in heterogeneous XML documents collections

XSym'05 Proceedings of the Third international conference on Database and XML Technologies

XS3: a system for similarity evaluation in multimedia-based heterogeneous XML repositories

MM '08 Proceedings of the 16th ACM international conference on Multimedia
Extensible User-Based XML Grammar Matching

ER '09 Proceedings of the 28th International Conference on Conceptual Modeling
A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics

Web Semantics: Science, Services and Agents on the World Wide Web
Minimizing user effort in XML grammar matching

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the Web continues to grow and evolve, more and more information is being placed in structurally rich documents, XML documents in particular, so as to improve the efficiency of similarity clustering, information retrieval and data management applications. Various algorithms for comparing hierarchically structured data, e.g., XML documents, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being modeled as ordered labeled trees. Nevertheless, a thorough investigation of current approaches led us to identify several structural similarity aspects, i.e. sub-tree related similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an improved comparison method to deal with fine-grained sub-trees and leaf node repetitions, without increasing overall complexity with respect to current XML comparison methods. Our approach consists of two main algorithms for discovering the structural commonality between sub-trees and computing tree-based edit operations costs. A prototype has been developed to evaluate the optimality and performance of our method. Experimental results, on both real and synthetic XML data, demonstrate better performance with respect to alternative XML comparison methods.