A matching algorithm for measuring the structural similarity between an XML document and a DTD and its applications

Authors:
Elisa Bertino;Giovanna Guerrini;Marco Mesiti
Affiliations:
Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Via Comelico 39/41, 20135 Milano, Italy;Dipartimento di Informatica, Università degli Studi di Pisa, Via Buonarroti 2, 56127 Pisa, Italy;Dipartimento di Informatica e Comunicazione, Università degli Studi di Milano, Via Comelico 39/41, 20135 Milano, Italy
Venue:
Information Systems - Special issue on web data integration
Year:
2004

Citing 18
Cited 42

A comparative analysis of methodologies for database schema integration

ACM Computing Surveys (CSUR)
Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
Federated database systems for managing distributed, heterogeneous, and autonomous databases

ACM Computing Surveys (CSUR) - Special issue on heterogeneous databases
On the editing distance between unordered labeled trees

Information Processing Letters
Issues and approaches of database integration

Communications of the ACM
XTRACT: a system for extracting document type descriptors from XML documents

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Algorithms for association rule mining — a general survey and comparison

ACM SIGKDD Explorations Newsletter
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Modern Information Retrieval

Modern Information Retrieval
XClust: clustering XML schemas for effective integration

Proceedings of the eleventh international conference on Information and knowledge management
Specifying and enforcing access control policies for XML document sources

World Wide Web
Protection and administration of XML data sources

Data & Knowledge Engineering - Data and applications security
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
Evolving a Set of DTDs According to a Dynamic Set of XML Documents

EDBT '02 Proceedings of the Worshops XMLDM, MDDE, and YRWS on XML-Based Data Management and Multimedia Engineering-Revised Papers
Comparison of Schema Matching Evaluations

Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Tag semantics for the retrieval of XML documents

ISICT '03 Proceedings of the 1st international symposium on Information and communication technologies
An abstraction-based approach to measuring the structural similarity between two unordered XML documents

ISICT '03 Proceedings of the 1st international symposium on Information and communication technologies
Fast Detection of XML Structural Similarity

IEEE Transactions on Knowledge and Data Engineering
Using XML to represent knowledge by frames

CompSysTech '04 Proceedings of the 5th international conference on Computer systems and technologies
Finding an optimum edit script between an XML document and a DTD

Proceedings of the 2005 ACM symposium on Applied computing
Computing edit distances between an XML document and a schema and its application in document classification

Proceedings of the 2006 ACM symposium on Applied computing
Integration of transient Web services into a virtual peer to peer Web service registry

Distributed and Parallel Databases
XML schema clustering with semantic and hierarchical similarity measures

Knowledge-Based Systems
XML version detection

Proceedings of the 2007 ACM symposium on Document engineering
Measuring the structural similarity of semistructured documents using entropy

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
An Exploratory Study of Database Integration Processes

IEEE Transactions on Knowledge and Data Engineering
Measuring the structural similarity among XML documents and DTDs

Journal of Intelligent Information Systems
A heuristic algorithm for clustering rooted ordered trees

Intelligent Data Analysis
Computing structural similarity of source XML schemas against domain XML schema

ADC '08 Proceedings of the nineteenth conference on Australasian database - Volume 75
A Hybrid Approach for XML Similarity

SOFSEM '07 Proceedings of the 33rd conference on Current Trends in Theory and Practice of Computer Science
Theories of meaning in schema matching: An exploratory study

Information Systems
Equivalence of XSD Constructs and Its Exploitation in Similarity Evaluation

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
Process of applying data mining techniques to XML data

Proceedings of the 2006 conference on Advances in Intelligent IT: Active Media Technology 2006
Extensible User-Based XML Grammar Matching

ER '09 Proceedings of the 28th International Conference on Conceptual Modeling
Performance Improvement in Automatic Question Answering System Based on Dependency Term

WISM '09 Proceedings of the International Conference on Web Information Systems and Mining
Semantic Structural Similarity Measure for Clustering XML Documents

WISM '09 Proceedings of the International Conference on Web Information Systems and Mining
Structural and semantic aspects of similarity of Document Type Definitions and XML schemas

Information Sciences: an International Journal
A fast algebraic web verification service

RR'07 Proceedings of the 1st international conference on Web reasoning and rule systems
A fine-grained XML structural comparison approach

ER'07 Proceedings of the 26th international conference on Conceptual modeling
An approach for measuring similarity between XML documents

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
GRAMS3: an efficient framework for XML structural similarity search

DASFAA'10 Proceedings of the 15th international conference on Database systems for advanced applications
Contextual factors in database integration: a Delphi study

ER'10 Proceedings of the 29th international conference on Conceptual modeling
XML data clustering: An overview

ACM Computing Surveys (CSUR)
Efficient schema extraction from a large collection of XML documents

Proceedings of the 49th Annual Southeast Regional Conference
XCLS: a fast and effective clustering algorithm for heterogenous XML documents

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
XML documents clustering by structures

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Fast approximate matching between XML documents and schemata

APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
XMine: a methodology for mining XML structure

APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
On the midpoint of a set of XML documents

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Automatic generation of semantic fields for resource discovery in the semantic web

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Diχeminator: a profile-based selective dissemination system for XML documents

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics

Web Semantics: Science, Services and Agents on the World Wide Web
Survey: An overview on XML similarity: Background, current trends and future directions

Computer Science Review
Minimizing user effort in XML grammar matching

Information Sciences: an International Journal
XML class outlier detection

Proceedings of the 16th International Database Engineering & Applications Sysmposium
Measuring structural similarity of semistructured data based on information-theoretic approaches

The VLDB Journal — The International Journal on Very Large Data Bases
Hierarchical clustering of XML documents focused on structural components

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we propose a matching algorithm for measuring the structural similarity between an XML document and a DTD. The matching algorithm, by comparing the document structure against the one the DTD requires, is able to identify commonalities and differences. Differences can be due to the presence of extra elements with respect to those the DTD requires and to the absence of required elements. The evaluation of commonalities and differences gives raise to a numerical rank of the structural similarity. Moreover, in the paper, some applications of the matching algorithm are discussed. Specifically, the matching algorithm is exploited for the classification of XML documents against a set of DTDs, the evolution of the DTD structure, the evaluation of structural queries, the selective dissemination of XML documents, and the protection of XML document contents.