Fast parallel and serial approximate string matching
Journal of Algorithms
Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Approximate tree matching in the presence of variable length don't cares
Journal of Algorithms
Approximate string matching with don't care characters
Information Processing Letters
Change detection in hierarchically structured information
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
WWW '99 Proceedings of the eighth international conference on World Wide Web
The String-to-String Correction Problem
Journal of the ACM (JACM)
Bounds on the Complexity of the Longest Common Subsequence Problem
Journal of the ACM (JACM)
Bounds for the String Editing Problem
Journal of the ACM (JACM)
The Tree-to-Tree Correction Problem
Journal of the ACM (JACM)
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Information Retrieval
XClust: clustering XML schemas for effective integration
Proceedings of the eleventh international conference on Information and knowledge management
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Comparing Hierarchical Data in External Memory
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
Detecting Changes in XML Documents
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
COMA: a system for flexible combination of schema matching approaches
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A methodology for clustering XML documents by structure
Information Systems
Approximate subtree identification in heterogeneous XML documents collections
XSym'05 Proceedings of the Third international conference on Database and XML Technologies
XS3: a system for similarity evaluation in multimedia-based heterogeneous XML repositories
MM '08 Proceedings of the 16th ACM international conference on Multimedia
XML data clustering: An overview
ACM Computing Surveys (CSUR)
Measuring XML structured-ness with entropy
WAIM'11 Proceedings of the 2011 international conference on Web-Age Information Management
Survey: An overview on XML similarity: Background, current trends and future directions
Computer Science Review
Structural and semantic similarity for XML comparison
Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems
Hi-index | 0.00 |
The automatic processing and management of XML-based data are ever more popular research issues due to the increasing abundant use of XML, especially on the Web. Nonetheless, several operations based on the structure of XML data have not yet received strong attention. Among these is the process of matching XML documents with XML grammars, useful in various applications such as documents classification, retrieval and selective dissemination of information. In this paper, we propose an algorithm for measuring the structural similarity between an XML document and a Document Type Definition (DTD) considered as the simplest way for specifying structural constraints on XML documents. We consider the various DTD operators that designate constraints on the existence, repeatability and alternativeness of XML elements/attributes. Our approach is based on the concept of tree edit distance, as an effective and efficient means for comparing tree structures, XML documents and DTDs being modeled as ordered labeled trees. It is of polynomial complexity, in comparison with existing exponential algorithms. Classification experiments, conducted on large sets of real and synthetic XML documents, underline our approach effectiveness, as well as its applicability to large XML repositories and databases.