A comparative analysis of methodologies for database schema integration
ACM Computing Surveys (CSUR)
XClust: clustering XML schemas for effective integration
Proceedings of the eleventh international conference on Information and knowledge management
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Rondo: a programming platform for generic model management
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Learning to match ontologies on the Semantic Web
The VLDB Journal — The International Journal on Very Large Data Bases
Adapting a Generic Match Algorithm to Align Ontologies of Human Anatomy
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Element matching across data-oriented XML sources using a multi-strategy clustering model
Data & Knowledge Engineering
iMAP: discovering complex semantic matches between database schemas
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Discovering complex matchings across web query interfaces: a correlation mining approach
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Industrial-strength schema matching
ACM SIGMOD Record
Holistic Query Interface Matching using Parallel Schema Matching
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Using Element Clustering to Increase the Efficiency of XML Schema Matching
ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Enterprise information mashups: integrating information, simply
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficiently Mining Frequent Embedded Unordered Trees
Fundamenta Informaticae - Advances in Mining Graphs, Trees and Sequences
Matching large schemas: Approaches and evaluation
Information Systems
XBenchMatch: a benchmark for XML schema matching tools
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
An experiment on the matching and reuse of XML schemas
ICWE'05 Proceedings of the 5th international conference on Web Engineering
A survey of schema-based matching approaches
Journal on Data Semantics IV
Automatic Extraction of Structurally Coherent Mini-Taxonomies
ER '08 Proceedings of the 27th International Conference on Conceptual Modeling
Mediation-Based XML Query Answerability
OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
XMiner: Mining XML Mediated Schemas
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Improving XML schema matching performance using Prüfer sequences
Data & Knowledge Engineering
Complex Schema Match Discovery and Validation through Collaboration
OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part I
XML Schema Element Similarity Measures: A Schema Matching Context
OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II
Rewrite techniques for performance optimization of schema matching processes
Proceedings of the 13th International Conference on Extending Database Technology
Structural and semantic aspects of similarity of Document Type Definitions and XML schemas
Information Sciences: an International Journal
Biochemical network matching and composition
Proceedings of the 2010 EDBT/ICDT Workshops
Semi-automated schema integration with SASMINT
Knowledge and Information Systems
A framework for schema matcher composition
WSEAS Transactions on Computers
Element similarity measures in XML schema matching
Information Sciences: an International Journal
On matching large life science ontologies in parallel
DILS'10 Proceedings of the 7th international conference on Data integration in the life sciences
Double-layered schema integration of heterogeneous XML sources
Journal of Systems and Software
FORUM: a flexible data integration system based on data semantics
ACM SIGMOD Record
XML data clustering: An overview
ACM Computing Surveys (CSUR)
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Minimizing user effort in XML grammar matching
Information Sciences: an International Journal
Matching Attributes across Overlapping Heterogeneous Data Sources Using Mutual Information
Journal of Database Management
Target-driven merging of taxonomies with Atom
Information Systems
Hi-index | 0.00 |
Semantic matching of schemas in heterogeneous data sharing systems is time consuming and error prone. Existing mapping tools employ semi-automatic techniques for mapping two schemas at a time. In a large-scale scenario, where data sharing involves a large number of data sources, such techniques are not suitable. We present a new robust automatic method which discovers semantic schema matches in a large set of XML schemas, incrementally creates an integrated schema encompassing all schema trees, and defines mappings from the contributing schemas to the integrated schema. Our method, PORSCHE (Performance ORiented SCHEma mediation), utilises a holistic approach which first clusters the nodes based on linguistic label similarity. Then it applies a tree mining technique using node ranks calculated during depth-first traversal. This minimises the target node search space and improves performance, which makes the technique suitable for large-scale data sharing. The PORSCHE framework is hybrid in nature and flexible enough to incorporate more matching techniques or algorithms. We report on experiments with up to 80 schemas containing 83,770 nodes, with our prototype implementation taking 587s on average to match and merge them, resulting in an integrated schema and returning mappings from all input schemas to the integrated schema. The quality of matching in PORSCHE is shown using precision, recall and F-measure on randomly selected pairs of schemas from the same domain. We also discuss the integrity of the mediated schema in the light of completeness and minimality measures.