Matching large schemas: Approaches and evaluation

Authors:
Hong-Hai Do;Erhard Rahm
Affiliations:
Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstr. 16-18, 04107 Leipzig, Germany;Department of Computer Science, University of Leipzig, Augustusplatz 10-11, 04109 Leipzig, Germany
Venue:
Information Systems
Year:
2007

Citing 45
Cited 44

SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks

Data & Knowledge Engineering
Approximate String Matching

ACM Computing Surveys (CSUR)
Comparative analysis of six XML schema languages

ACM SIGMOD Record
Semantic integration of heterogeneous information sources

Data & Knowledge Engineering - Special issue on heterogeneous information resources need semantic access
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Learning to map between ontologies on the semantic web

Proceedings of the 11th international conference on World Wide Web
Information Retrieval

Information Retrieval
Database intergration using neural networks: implementation and experiences

Knowledge and Information Systems
XClust: clustering XML schemas for effective integration

Proceedings of the eleventh international conference on Information and knowledge management
Using Schema Matching to Simplify Heterogeneous Data Translation

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
Semantic Integration in Heterogeneous Databases Using Neural Networks

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Database Schema Matching Using Machine Learning with Feature Selection

CAiSE '02 Proceedings of the 14th International Conference on Advanced Information Systems Engineering
Comparison of Schema Matching Evaluations

Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Discovering Direct and Indirect Matches for Schema Elements

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
A Schema Analysis and Reconciliation Tool Environment for Heterogeneous Databases

IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
On schema matching with opaque column names and data values

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Statistical schema matching across web query interfaces

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Mapping XML and Relational Schemas with Clio

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Attribute Classification Using Feature Analysis

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Synthesizing an Integrated Ontology

IEEE Internet Computing
The PROMPT suite: interactive tools for ontology merging and mapping

International Journal of Human-Computer Studies
Ontology mapping: the state of the art

The Knowledge Engineering Review
Adapting a Generic Match Algorithm to Align Ontologies of Human Anatomy

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
iMAP: discovering complex semantic matches between database schemas

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Matching large XML schemas

ACM SIGMOD Record
Industrial-strength schema matching

ACM SIGMOD Record
Semantic integration: a survey of ontology-based approaches

ACM SIGMOD Record
Corpus-Based Schema Matching

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Schema Matching Using Duplicates

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Clio grows up: from research prototype to industrial tool

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Schema and ontology matching with COMA++

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Tuning schema matching software using synthetic scenarios

VLDB '05 Proceedings of the 31st international conference on Very large data bases
HePToX: marrying XML and heterogeneity in your P2P databases

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Semantic-integration research in the database community

AI Magazine - Special issue on semantic integration
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Wise-integrator: an automatic integrator of web search interfaces for E-commerce

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Instance-based schema matching for web databases by domain-specific query probing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A large scale taxonomy mapping evaluation

ISWC'05 Proceedings of the 4th international conference on The Semantic Web
Efficient semantic matching

ESWC'05 Proceedings of the Second European conference on The Semantic Web: research and Applications
An experiment on the matching and reuse of XML schemas

ICWE'05 Proceedings of the 5th international conference on Web Engineering
CMC: combining multiple schema-matching strategies based on credibility prediction

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications

Modeling and manipulating the structure of hierarchical schemas for the web

Information Sciences: an International Journal
PORSCHE: Performance ORiented SCHEma mediation

Information Systems
Matching large ontologies: A divide-and-conquer approach

Data & Knowledge Engineering
Ten Challenges for Ontology Matching

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
XMiner: Mining XML Mediated Schemas

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Improving XML schema matching performance using Prüfer sequences

Data & Knowledge Engineering
A Virtual Data Source for Service Grids

Globe '09 Proceedings of the 2nd International Conference on Data Management in Grid and Peer-to-Peer Systems
Computational complexity of schema matching approaches

MAMECTIS'09 Proceedings of the 11th WSEAS international conference on Mathematical methods, computational techniques and intelligent systems
Optimization and comparison of schema matching solutions

MAMECTIS'09 Proceedings of the 11th WSEAS international conference on Mathematical methods, computational techniques and intelligent systems
Computational requirement of schema matching algorithms

WSEAS Transactions on Information Science and Applications
An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size

Web Semantics: Science, Services and Agents on the World Wide Web
Complex Schema Match Discovery and Validation through Collaboration

OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part I
XML Schema Element Similarity Measures: A Schema Matching Context

OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II
Pre-matching: Large XML Schemas Decomposition Approach

OTM '09 Proceedings of the Confederated International Workshops and Posters on On the Move to Meaningful Internet Systems: ADI, CAMS, EI2N, ISDE, IWSSA, MONET, OnToContent, ODIS, ORM, OTM Academy, SWWS, SEMELS, Beyond SAWSDL, and COMBEK 2009
On the Use of Description Logic for Semantic Interoperability of Enterprise Systems

OTM '09 Proceedings of the Confederated International Workshops and Posters on On the Move to Meaningful Internet Systems: ADI, CAMS, EI2N, ISDE, IWSSA, MONET, OnToContent, ODIS, ORM, OTM Academy, SWWS, SEMELS, Beyond SAWSDL, and COMBEK 2009
Calibration and comparison of schema matchers

WSEAS Transactions on Mathematics
Rewrite techniques for performance optimization of schema matching processes

Proceedings of the 13th International Conference on Extending Database Technology
Structural and semantic aspects of similarity of Document Type Definitions and XML schemas

Information Sciences: an International Journal
Flexible Dataspace Management Through Model Management

Proceedings of the 2010 EDBT/ICDT Workshops
A Survey on Uncertainty Management in Data Integration

Journal of Data and Information Quality (JDIQ)
Automatically incorporating new sources in keyword search-based data integration

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Techniques for discovering correspondences between ontologies

International Journal of Web and Grid Services
A framework for schema matcher composition

WSEAS Transactions on Computers
Element similarity measures in XML schema matching

Information Sciences: an International Journal
On matching large life science ontologies in parallel

DILS'10 Proceedings of the 7th international conference on Data integration in the life sciences
Double-layered schema integration of heterogeneous XML sources

Journal of Systems and Software
Measuring the quality of an integrated schema

ER'10 Proceedings of the 29th international conference on Conceptual modeling
Pattern-based mapping refinement

EKAW'10 Proceedings of the 17th international conference on Knowledge engineering and management by the masses
Restricting the overlap of Top-N sets in schema matching

Proceedings of the 1st Workshop on New Trends in Similarity Search
Conflict detection method in adopting global XML standard for database systems

Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication
Recovering grammar relationships for the Java Language Specification

Software Quality Control
A clustering-based approach for large-scale ontology matching

ADBIS'11 Proceedings of the 15th international conference on Advances in databases and information systems
A novel XML document structure comparison framework based-on sub-tree commonalities and label semantics

Web Semantics: Science, Services and Agents on the World Wide Web
DSToolkit: an architecture for flexible dataspace management

Transactions on Large-Scale Data- and Knowledge-Centered Systems V
Minimizing user effort in XML grammar matching

Information Sciences: an International Journal
Neighbour based structural proximity measures for ontology matching systems

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Constructing virtual documents for ontology matching using mapreduce

JIST'11 Proceedings of the 2011 joint international conference on The Semantic Web
Performance oriented schema matching

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Towards A Semi-Automatic Transformation Process in MDA: Architecture, Methodology and First Experiments

International Journal of Information System Modeling and Design
Matching Attributes across Overlapping Heterogeneous Data Sources Using Mutual Information

Journal of Database Management
Actively soliciting feedback for query answers in keyword search-based data integration

Proceedings of the VLDB Endowment
Incrementally improving dataspaces based on user feedback

Information Systems
MatchBench: benchmarking schema matching algorithms for schematic correspondences

BNCOD'13 Proceedings of the 29th British National conference on Big Data
Assessing the quality of large-scale data standards: A case of XBRL GAAP Taxonomy

Decision Support Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current schema matching approaches still have to improve for large and complex Schemas. The large search space increases the likelihood for false matches as well as execution times. Further difficulties for Schema matching are posed by the high expressive power and versatility of modern schema languages, in particular user-defined types and classes, component reuse capabilities, and support for distributed schemas and namespaces. To better assist the user in matching complex schemas, we have developed a new generic schema matching tool, COMA++, providing a library of individual matchers and a flexible infrastructure to combine the matchers and refine their results. Different match strategies can be applied including a new scalable approach to identify context-dependent correspondences between schemas with shared elements and a fragment-based match approach which decomposes a large match task into smaller tasks. We conducted a comprehensive evaluation of the match strategies using large e-Business standard schemas. Besides providing helpful insights for future match implementations, the evaluation demonstrated the practicability of our system for matching large schemas.