PORSCHE: Performance ORiented SCHEma mediation

Authors:
Khalid Saleem;Zohra Bellahsene;Ela Hunt
Affiliations:
LIRMM-UMR 5506, Université Montpellier 2, 34392 Montpellier, France;LIRMM-UMR 5506, Université Montpellier 2, 34392 Montpellier, France;GlobIS, Department of Computer Science, ETH Zurich, CH-8092 Zurich, Switzerland
Venue:
Information Systems
Year:
2008

Citing 19
Cited 20

A comparative analysis of methodologies for database schema integration

ACM Computing Surveys (CSUR)
XClust: clustering XML schemas for effective integration

Proceedings of the eleventh international conference on Information and knowledge management
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Rondo: a programming platform for generic model management

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Learning to match ontologies on the Semantic Web

The VLDB Journal — The International Journal on Very Large Data Bases
Adapting a Generic Match Algorithm to Align Ontologies of Human Anatomy

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Element matching across data-oriented XML sources using a multi-strategy clustering model

Data & Knowledge Engineering
iMAP: discovering complex semantic matches between database schemas

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Discovering complex matchings across web query interfaces: a correlation mining approach

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Industrial-strength schema matching

ACM SIGMOD Record
Holistic Query Interface Matching using Parallel Schema Matching

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Using Element Clustering to Increase the Efficiency of XML Schema Matching

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Enterprise information mashups: integrating information, simply

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficiently Mining Frequent Embedded Unordered Trees

Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Matching large schemas: Approaches and evaluation

Information Systems
XBenchMatch: a benchmark for XML schema matching tools

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
An experiment on the matching and reuse of XML schemas

ICWE'05 Proceedings of the 5th international conference on Web Engineering
A survey of schema-based matching approaches

Journal on Data Semantics IV

Automatic Extraction of Structurally Coherent Mini-Taxonomies

ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
Mediation-Based XML Query Answerability

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
XMiner: Mining XML Mediated Schemas

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Improving XML schema matching performance using Prüfer sequences

Data & Knowledge Engineering
Complex Schema Match Discovery and Validation through Collaboration

OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part I
XML Schema Element Similarity Measures: A Schema Matching Context

OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II
Rewrite techniques for performance optimization of schema matching processes

Proceedings of the 13th International Conference on Extending Database Technology
Structural and semantic aspects of similarity of Document Type Definitions and XML schemas

Information Sciences: an International Journal
Biochemical network matching and composition

Proceedings of the 2010 EDBT/ICDT Workshops
Semi-automated schema integration with SASMINT

Knowledge and Information Systems
A framework for schema matcher composition

WSEAS Transactions on Computers
Element similarity measures in XML schema matching

Information Sciences: an International Journal
On matching large life science ontologies in parallel

DILS'10 Proceedings of the 7th international conference on Data integration in the life sciences
Double-layered schema integration of heterogeneous XML sources

Journal of Systems and Software
FORUM: a flexible data integration system based on data semantics

ACM SIGMOD Record
XML data clustering: An overview

ACM Computing Surveys (CSUR)
Evaluation of a Semi-automated Semantic Annotation Approach for Bootstrapping the Analysis of Large-Scale Web Service Networks

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Minimizing user effort in XML grammar matching

Information Sciences: an International Journal
Matching Attributes across Overlapping Heterogeneous Data Sources Using Mutual Information

Journal of Database Management
Target-driven merging of taxonomies with Atom

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Semantic matching of schemas in heterogeneous data sharing systems is time consuming and error prone. Existing mapping tools employ semi-automatic techniques for mapping two schemas at a time. In a large-scale scenario, where data sharing involves a large number of data sources, such techniques are not suitable. We present a new robust automatic method which discovers semantic schema matches in a large set of XML schemas, incrementally creates an integrated schema encompassing all schema trees, and defines mappings from the contributing schemas to the integrated schema. Our method, PORSCHE (Performance ORiented SCHEma mediation), utilises a holistic approach which first clusters the nodes based on linguistic label similarity. Then it applies a tree mining technique using node ranks calculated during depth-first traversal. This minimises the target node search space and improves performance, which makes the technique suitable for large-scale data sharing. The PORSCHE framework is hybrid in nature and flexible enough to incorporate more matching techniques or algorithms. We report on experiments with up to 80 schemas containing 83,770 nodes, with our prototype implementation taking 587s on average to match and merge them, resulting in an integrated schema and returning mappings from all input schemas to the integrated schema. The quality of matching in PORSCHE is shown using precision, recall and F-measure on randomly selected pairs of schemas from the same domain. We also discuss the integrity of the mediated schema in the light of completeness and minimality measures.