Structural similarity evaluation between XML documents and DTDs

Authors:
Joe Tekli;Richard Chbeir;Kokou Yetongnon
Affiliations:
LE2I Laboratory, UMR-CNRS, University of Bourgogne, Dijon Cedex, France;LE2I Laboratory, UMR-CNRS, University of Bourgogne, Dijon Cedex, France;LE2I Laboratory, UMR-CNRS, University of Bourgogne, Dijon Cedex, France
Venue:
WISE'07 Proceedings of the 8th international conference on Web information systems engineering
Year:
2007

Citing 21
Cited 5

Fast parallel and serial approximate string matching

Journal of Algorithms
Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
Approximate tree matching in the presence of variable length don't cares

Journal of Algorithms
Approximate string matching with don't care characters

Information Processing Letters
Change detection in hierarchically structured information

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A query language for XML

WWW '99 Proceedings of the eighth international conference on World Wide Web
The String-to-String Correction Problem

Journal of the ACM (JACM)
Bounds on the Complexity of the Longest Common Subsequence Problem

Journal of the ACM (JACM)
Bounds for the String Editing Problem

Journal of the ACM (JACM)
The Tree-to-Tree Correction Problem

Journal of the ACM (JACM)
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Information Retrieval

Information Retrieval
XClust: clustering XML schemas for effective integration

Proceedings of the eleventh international conference on Information and knowledge management
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Comparing Hierarchical Data in External Memory

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
Detecting Changes in XML Documents

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A methodology for clustering XML documents by structure

Information Systems
Approximate subtree identification in heterogeneous XML documents collections

XSym'05 Proceedings of the Third international conference on Database and XML Technologies

XS3: a system for similarity evaluation in multimedia-based heterogeneous XML repositories

MM '08 Proceedings of the 16th ACM international conference on Multimedia
XML data clustering: An overview

ACM Computing Surveys (CSUR)
Measuring XML structured-ness with entropy

WAIM'11 Proceedings of the 2011 international conference on Web-Age Information Management
Survey: An overview on XML similarity: Background, current trends and future directions

Computer Science Review
Structural and semantic similarity for XML comparison

Proceedings of the Fifth International Conference on Management of Emergent Digital EcoSystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The automatic processing and management of XML-based data are ever more popular research issues due to the increasing abundant use of XML, especially on the Web. Nonetheless, several operations based on the structure of XML data have not yet received strong attention. Among these is the process of matching XML documents with XML grammars, useful in various applications such as documents classification, retrieval and selective dissemination of information. In this paper, we propose an algorithm for measuring the structural similarity between an XML document and a Document Type Definition (DTD) considered as the simplest way for specifying structural constraints on XML documents. We consider the various DTD operators that designate constraints on the existence, repeatability and alternativeness of XML elements/attributes. Our approach is based on the concept of tree edit distance, as an effective and efficient means for comparing tree structures, XML documents and DTDs being modeled as ordered labeled trees. It is of polynomial complexity, in comparison with existing exponential algorithms. Classification experiments, conducted on large sets of real and synthetic XML documents, underline our approach effectiveness, as well as its applicability to large XML repositories and databases.