Analyzing and revising data integration schemas to improve their matchability

Authors:
Xiaoyong Chai;Mayssam Sayyadian;AnHai Doan;Arnon Rosenthal;Len Seligman
Affiliations:
University of Wisconsin-Madison;University of Wisconsin-Madison;University of Wisconsin-Madison;The MITRE Corporation;The MITRE Corporation
Venue:
Proceedings of the VLDB Endowment
Year:
2008

Citing 19
Cited 4

SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks

Data & Knowledge Engineering
Data-driven understanding and refinement of schema mappings

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
The Use of Information Capacity in Schema Integration and Translation

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Comparison of Schema Matching Evaluations

Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
The chatty web: emergent semantics through gossiping

WWW '03 Proceedings of the 12th international conference on World Wide Web
On schema matching with opaque column names and data values

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
iMAP: discovering complex semantic matches between database schemas

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Introduction to the special issue on semantic integration

ACM SIGMOD Record
Schema and ontology matching with COMA++

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Tuning schema matching software using synthetic scenarios

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Merging Interface Schemas on the Deep Web via Clustering Aggregation

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Interactive query formulation over web service-accessed sources

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Debugging schema mappings with routes

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Incremental schema matching

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Leveraging data and structure in ontology integration

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Compiling mappings to bridge applications and databases

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Self-organizing schema mappings in the GridVine peer data management system

VLDB '07 Proceedings of the 33rd international conference on Very large data bases

Ten Challenges for Ontology Matching

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems
HAMSTER: using search clicklogs for schema and taxonomy matching

Proceedings of the VLDB Endowment
Actively Learning Ontology Matching via User Interaction

ISWC '09 Proceedings of the 8th International Semantic Web Conference
Schema label normalization for improving schema matching

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data integration systems often provide a uniform query interface, called a mediated schema, to a multitude of data sources. To answer user queries, such systems employ a set of semantic matches between the mediated schema and the data-source schemas. Finding such matches is well known to be difficult. Hence much work has focused on developing semi-automatic techniques to efficiently find the matches. In this paper we consider the complementary problem of improving the mediated schema, to make finding such matches easier. Specifically, a mediated schema S will typically be matched with many source schemas. Thus, can the developer of S analyze and revise S in a way that preserves S's semantics, and yet makes it easier to match with in the future? In this paper we provide an affirmative answer to the above question, and outline a promising solution direction, called mSeer. Given a mediated schema S and a matching tool M, mSeer first computes a matchability score that quantifies how well S can be matched against using M. Next, mSeer uses this score to generate a matchability report that identifies the problems in matching S. Finally, mSeer addresses these problems by automatically suggesting changes to S (e.g., renaming an attribute, reformatting data values, etc.) that it believes will preserve the semantics of S and yet make it more amenable to matching. We present extensive experiments over several real-world domains that demonstrate the promise of the proposed approach.