Schema Matching Using Duplicates

Authors:
Alexander Bilke;Felix Naumann
Affiliations:
Technische Universität Berlin;Humboldt-Universität zu Berlin
Venue:
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Year:
2005

Citing 22
Cited 40

Efficient algorithms for finding maximum matching in graphs

ACM Computing Surveys (CSUR)
Combinatorial optimization: algorithms and complexity

Combinatorial optimization: algorithms and complexity
Block edit models for approximate string matching

Theoretical Computer Science - Special issue: Latin American theoretical informatics
Integration of heterogeneous databases without common domains using queries based on textual similarity

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Learning object identification rules for information integration

Information Systems - Data extraction, cleaning and reconciliation
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem

Data Mining and Knowledge Discovery
Entity Identification in Database Integration

Proceedings of the Ninth International Conference on Data Engineering
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
Comparison of Schema Matching Evaluations

Revised Papers from the NODe 2002 Web and Database-Related Workshops on Web, Web-Services, and Database Systems
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Text joins in an RDBMS for web data integration

WWW '03 Proceedings of the 12th international conference on World Wide Web
Attribute Classification Using Feature Analysis

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
TAILOR: A Record Linkage Tool Box

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Instance-based attribute identification in database integration

The VLDB Journal — The International Journal on Very Large Data Bases
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
iMAP: discovering complex semantic matches between database schemas

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Eliminating fuzzy duplicates in data warehouses

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Translating web data

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Category translation: learning to understand information on the internet

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1

Relational data mapping in MIQIS

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Tuning schema matching software using synthetic scenarios

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Automatic data fusion with HumMer

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Automatic structured query transformation over distributed digital libraries

Proceedings of the 2006 ACM symposium on Applied computing
XML Mapping technology: making connections in an XML-centric world

IBM Systems Journal
eTuner: tuning schema matching software using synthetic scenarios

The VLDB Journal — The International Journal on Very Large Data Bases
Information retrieval and machine learning for probabilistic schema matching

Information Processing and Management: an International Journal
Why is schema matching tough and what can we do about it?

ACM SIGMOD Record
Matching large schemas: Approaches and evaluation

Information Systems
Query relaxation using malleable schemas

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Rank Aggregation for Automatic Schema Matching

IEEE Transactions on Knowledge and Data Engineering
An adaptive approach to schema classification for data warehouse modeling

Journal of Computer Science and Technology
Quickmig: automatic schema matching for data migration projects

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Schema mapping verification: the spicy way

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Automatically refining the wikipedia infobox ontology

Proceedings of the 17th international conference on World Wide Web
Data fusion

ACM Computing Surveys (CSUR)
Integrating web query results: holistic schema matching

Proceedings of the 17th ACM conference on Information and knowledge management
Advances in Ontology Matching

Advances in Web Semantics I
ODE: Ontology-assisted data extraction

ACM Transactions on Database Systems (TODS)
Combining a Logical and a Numerical Method for Data Reconciliation

Journal on Data Semantics XII
A Prioritized Collective Selection Strategy for Schema Matching across Query Interfaces

BNCOD 26 Proceedings of the 26th British National Conference on Databases: Dataspace: The Final Frontier
An instance-based approach for domain-independent schema matching

Proceedings of the 46th Annual Southeast Regional Conference on XX
A model for matching and integrating heterogeneous relational biomedical databases schemas

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
A hierarchical approach to model web query interfaces for web source integration

Proceedings of the VLDB Endowment
Partial Ontology Matching Using Instance Features

OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II
Association pattern mining for product specification integration

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 2
Integrating schemas of heterogeneous relational databases through schema matching

Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services
Redundancy-driven web data extraction and integration

Procceedings of the 13th International Workshop on the Web and Databases
Editorial: Revising the constraints of lightweight mediated schemas

Data & Knowledge Engineering
Data integration systems for scientific applications

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems
Synthesizing products for online catalogs

Proceedings of the VLDB Endowment
Holistic schema matching for web query interfaces

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Data mapping as search

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Federating location-based data services

Data Management in a Connected World
Knowledge discovery through ontology matching: An approach based on an Artificial Neural Network model

Information Sciences: an International Journal
OPAL: automated form understanding for the deep web

Proceedings of the 21st international conference on World Wide Web
Instance-Based matching of large ontologies using locality-sensitive hashing

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Matching Attributes across Overlapping Heterogeneous Data Sources Using Mutual Information

Journal of Database Management
Cross-lingual entity matching and infobox alignment in Wikipedia

Information Systems
Aligning freebase with the YAGO ontology

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most data integration applications require a matching between the schemas of the respective data sets. We show how the existence of duplicates within these data sets can be exploited to automatically identify matching attributes. We describe an algorithm that first discovers duplicates among data sets with unaligned schemas and then uses these duplicates to perform schema matching between schemas with opaque column names. Discovering duplicates among data sets with unaligned schemas is more difficult than in the usual setting, because it is not clear which fields in one object should be compared with which fields in the other. We have developed a new algorithm that efficiently finds the most likely duplicates in such a setting. Now, our schema matching algorithm is able to identify corresponding attributes by comparing data values within those duplicate records. An experimental study on real-world data shows the effectiveness of this approach.