An evolutionary approach to complex schema matching

Authors:
MoiséS Gomes De Carvalho;Alberto H. F. Laender;Marcos André GonçAlves;Altigran S. Da Silva
Affiliations:
Instituto Nokia de Tecnologia, Manaus, AM, Brazil;Departamento de Ciência da Computação, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil;Departamento de Ciência da Computação, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil;Instituto de Computação, Universidade Federal do Amazonas, Manaus, AM, Brazil
Venue:
Information Systems
Year:
2013

Citing 26
Cited 0

Genetic programming: on the programming of computers by means of natural selection

Genetic programming: on the programming of computers by means of natural selection
Genetic programming: an introduction: on the automatic evolution of computer programs and its applications

Genetic programming: an introduction: on the automatic evolution of computer programs and its applications
SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks

Data & Knowledge Engineering
Data integration using similarity joins and a word-based information representation language

ACM Transactions on Information Systems (TOIS)
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Clio: a semi-automatic tool for schema mapping

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Using Schema Matching to Simplify Heterogeneous Data Translation

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
iMAP: discovering complex semantic matches between database schemas

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Iterative record linkage for cleaning and integration

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Corpus-Based Schema Matching

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Adaptive Name Matching in Information Integration

IEEE Intelligent Systems
Multi-column substring matching for database schema translation

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
eTuner: tuning schema matching software using synthetic scenarios

The VLDB Journal — The International Journal on Very Large Data Bases
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Why is schema matching tough and what can we do about it?

ACM SIGMOD Record
LABRADOR: Efficiently publishing relational databases on the web by using keyword-based query interfaces

Information Processing and Management: an International Journal
A Comparison of Personal Name Matching: Techniques and Practical Issues

ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Replica identification using genetic programming

Proceedings of the 2008 ACM symposium on Applied computing
Febrl -: an open source data cleaning, deduplication and record linkage system with a graphical user interface

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A strategy for allowing meaningful and comparable scores in approximate matching

Information Systems
Joint unsupervised structure discovery and information extraction

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Probabilistic data generation for deduplication and data linkage

IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
A Genetic Programming Approach to Record Deduplication

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The schema matching problem can be defined as the task of finding semantic relationships between schema elements existing in different data repositories. Despite the existence of elaborated graphic tools for helping to find such matches, this task is usually manually done. In this paper, we propose a novel evolutionary approach to addressing the problem of automatically finding complex matches between schemas of semantically related data repositories. To the best of our knowledge, this is the first approach that is capable of discovering complex schema matches using only the data instances. Since we only exploit the data stored in the repositories for this task, we rely on matching strategies that are based on record deduplication (aka, entity-oriented strategy) and information retrieval (aka, value-oriented strategy) techniques to find complex schema matches during the evolutionary process. To demonstrate the effectiveness of our approach, we conducted an experimental evaluation using real-world and synthetic datasets. The results show that our approach is able to find complex matches with high accuracy, similar to that obtained by more elaborated (hybrid) approaches, despite using only evidence based on the data instances.