Restricting the overlap of Top-N sets in schema matching

Authors:
Eric Peukert;Erhard Rahm
Affiliations:
SAP Research Dresden, Dresden, Germany;University of Leipzig, Leipzig, Germany
Venue:
Proceedings of the 1st Workshop on New Trends in Similarity Search
Year:
2011

Citing 12
Cited 1

A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
The PROMPT suite: interactive tools for ontology merging and mapping

International Journal of Human-Computer Studies
Bootstrapping ontology alignment methods with APFEL

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Incremental schema matching

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Matching large schemas: Approaches and evaluation

Information Systems
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Data integration with uncertainty

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Falcon-AO: A practical ontology matching system

Web Semantics: Science, Services and Agents on the World Wide Web
RiMOM: A Dynamic Multistrategy Ontology Alignment Framework

IEEE Transactions on Knowledge and Data Engineering
Putting Feedback into Incremental Schema Matching

WCSE '09 Proceedings of the 2009 WRI World Congress on Software Engineering - Volume 04
A survey of schema-based matching approaches

Journal on Data Semantics IV
Managing uncertainty in schema matching with top-k schema mappings

Journal on Data Semantics VI

Schema matching and embedded value mapping for databases with opaque column names and mixed continuous and discrete-valued data fields

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computing similarities between metadata elements is an essential process in schema and ontology matching systems. Such systems aim at reducing the manual effort of finding mappings for data integration or ontology alignment. Similarity measures compute syntactic, semantic or structural similarities of metadata elements. Typically, different similarities are combined and the most similar element pairs are selected to produce a best-1 mapping suggestion. Unfortunately automatic schema matching systems are only rarely commercially adopted since correcting the best-1mapping suggestion is often harder than finding the mapping manually. To alleviate this, schema matching must be used incrementally by computing Top-N mapping suggestions that the user can select from. However, current similarity measures and selection operators suggest the same target elements for many different source elements. This effect, that we call overlap, reduces the quality of schema matching significantly. To address this problem, we introduce a new weighted token similarity measure that implicitly decreases the overlap between Top-N sets. Secondly, a new Top-N selection operator is introduced that is able to increase the recall by restricting overlap directly. We evaluate our approaches on large-sized, real world matching problems and show the positive effect on match quality.