Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Change detection in hierarchically structured information
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
The String-to-String Correction Problem
Journal of the ACM (JACM)
Bounds on the Complexity of the Longest Common Subsequence Problem
Journal of the ACM (JACM)
Bounds for the String Editing Problem
Journal of the ACM (JACM)
Information Retrieval
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Comparing Hierarchical Data in External Memory
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Detecting Changes in XML Documents
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Clustering Algorithms and Validity Measures
SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management
Information Systems - Special issue on web data integration
A methodology for clustering XML documents by structure
Information Systems
Approximate subtree identification in heterogeneous XML documents collections
XSym'05 Proceedings of the Third international conference on Database and XML Technologies
XS3: a system for similarity evaluation in multimedia-based heterogeneous XML repositories
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Extensible User-Based XML Grammar Matching
ER '09 Proceedings of the 28th International Conference on Conceptual Modeling
Web Semantics: Science, Services and Agents on the World Wide Web
Minimizing user effort in XML grammar matching
Information Sciences: an International Journal
Hi-index | 0.00 |
As the Web continues to grow and evolve, more and more information is being placed in structurally rich documents, XML documents in particular, so as to improve the efficiency of similarity clustering, information retrieval and data management applications. Various algorithms for comparing hierarchically structured data, e.g., XML documents, have been proposed in the literature. Most of them make use of techniques for finding the edit distance between tree structures, XML documents being modeled as ordered labeled trees. Nevertheless, a thorough investigation of current approaches led us to identify several structural similarity aspects, i.e. sub-tree related similarities, which are not sufficiently addressed while comparing XML documents. In this paper, we provide an improved comparison method to deal with fine-grained sub-trees and leaf node repetitions, without increasing overall complexity with respect to current XML comparison methods. Our approach consists of two main algorithms for discovering the structural commonality between sub-trees and computing tree-based edit operations costs. A prototype has been developed to evaluate the optimality and performance of our method. Experimental results, on both real and synthetic XML data, demonstrate better performance with respect to alternative XML comparison methods.