Schema Matching Using Interattribute Dependencies

Authors:
Jaewoo Kang;Jeffrey F. Naughton
Affiliations:
Korea University, Seoul;University of Wisconsin, Madison
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2008

Citing 0
Cited 4

Contextual factors in database integration: a Delphi study

ER'10 Proceedings of the 29th international conference on Conceptual modeling
A call to arms: revisiting database design

ACM SIGMOD Record
Schema matching and embedded value mapping for databases with opaque column names and mixed continuous and discrete-valued data fields

ACM Transactions on Database Systems (TODS)
Matching Attributes across Overlapping Heterogeneous Data Sources Using Mutual Information

Journal of Database Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Schema matching is one of the key challenges in information integration. It is a labor-intensive and time-consuming process. To alleviate the problem, many automated solutions have been proposed. Most of the existing solutions mainly rely upon textual similarity of the data to be matched. However, there exist instances of the schema matching problem for which they do not even apply. Such problem instances typically arise when the column names in the schemas and the data in the columns are opaque or very difficult to interpret. In our previous work [36] we proposed a two-step technique to address this problem. In the first step, we measure the dependencies between attributes within tables using an information-theoretic measure and construct a dependency graph for each table capturing the dependencies among attributes. In the second step, we find matching node pairs across the dependency graphs by running a graph matching algorithm. In our previous work, we experimentally validated the accuracy of the approach. One remaining challenge is the computational complexity of the graph matching problem in the second step. In this paper we extend the previous work by improving the second phase of the algorithm incorporating efficient approximation algorithms into the framework.