Schema matching and embedded value mapping for databases with opaque column names and mixed continuous and discrete-valued data fields

Authors:
Anuj Jaiswal;David J. Miller;Prasenjit Mitra
Affiliations:
The Pennsylvania State University;The Pennsylvania State University;The Pennsylvania State University
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2013

Citing 25
Cited 0

An Eigendecomposition Approach to Weighted Graph Matching Problems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Elements of information theory

Elements of information theory
The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Semint: a system prototype for semantic integration in heterogeneous databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
The Handbook of Mathematics and Computational Science

The Handbook of Mathematics and Computational Science
Using Schema Matching to Simplify Heterogeneous Data Translation

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
Database Schema Matching Using Machine Learning with Feature Selection

CAiSE '02 Proceedings of the 14th International Conference on Advanced Information Systems Engineering
Comparison of Schema Matching Evaluations

Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
On schema matching with opaque column names and data values

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Semantic-integration research in the database community

AI Magazine - Special issue on semantic integration
Schema Matching Using Neural Network

WI '05 Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
Establishing value mappings using statistical models and user feedback

Proceedings of the 14th ACM international conference on Information and knowledge management
Data integration: the teenage years

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Incremental schema matching

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Eliminating fuzzy duplicates in data warehouses

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Schema Matching Using Interattribute Dependencies

IEEE Transactions on Knowledge and Data Engineering
Uninterpreted Schema Matching with Embedded Value Mapping under Opaque Column Names and Data Values

IEEE Transactions on Knowledge and Data Engineering
HAMSTER: using search clicklogs for schema and taxonomy matching

Proceedings of the VLDB Endowment
Restricting the overlap of Top-N sets in schema matching

Proceedings of the 1st Workshop on New Trends in Similarity Search
Identifying value mappings for data integration: an unsupervised approach

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Managing uncertainty in schema matching with top-k schema mappings

Journal on Data Semantics VI

Quantified Score

Hi-index	0.00

Visualization

Abstract

Schema matching and value mapping across two information sources, such as databases, are critical information aggregation tasks. Before data can be integrated from multiple tables, the columns and values within the tables must be matched. The complexities of both these problems grow quickly with the number of attributes to be matched and due to multiple semantics of data values. Traditional research has mostly tackled schema matching and value mapping independently, and for categorical (discrete-valued) attributes. We propose novel methods that leverage value mappings to enhance schema matching in the presence of opaque column names for schemas consisting of both continuous and discrete-valued attributes. An additional source of complexity is that a discrete-valued attribute in one schema could in fact be a quantized, encoded version of a continuous-valued attribute in the other schema. In our approach, which can tackle both “onto” and bijective schema matching, the fitness objective for matching a pair of attributes from two schemas exploits the statistical distribution over values within the two attributes. Suitable fitness objectives are based on Euclidean-distance and the data log-likelihood, both of which are applied in our experimental study. A heuristic local descent optimization strategy that uses two-opt switching to optimize attribute matches, while simultaneously embedding value mappings, is applied for our matching methods. Our experiments show that the proposed techniques matched mixed continuous and discrete-valued attribute schemas with high accuracy and, thus, should be a useful addition to a framework of (semi) automated tools for data alignment.