On schema matching with opaque column names and data values

Authors:
Jaewoo Kang;Jeffrey F. Naughton
Affiliations:
University of Wisconsin-Madison, Madison, WI;University of Wisconsin-Madison, Madison, WI
Venue:
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Year:
2003

Citing 17
Cited 68

Elements of information theory

Elements of information theory
A Graduated Assignment Algorithm for Graph Matching

IEEE Transactions on Pattern Analysis and Machine Intelligence
SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks

Data & Knowledge Engineering
A vision for management of complex models

ACM SIGMOD Record
Selectivity estimation using probabilistic models

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data-driven understanding and refinement of schema mappings

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Clio: a semi-automatic tool for schema mapping

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Global Viewing of Heterogeneous Data Sources

IEEE Transactions on Knowledge and Data Engineering
Using Schema Matching to Simplify Heterogeneous Data Translation

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Schema Mapping as Query Discovery

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Information Integration: The MOMIS Project Demonstration

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
Semantic Integration in Heterogeneous Databases Using Neural Networks

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Database Schema Matching Using Machine Learning with Feature Selection

CAiSE '02 Proceedings of the 14th International Conference on Advanced Information Systems Engineering
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Learning bayesian network structure from massive datasets: the «sparse candidate« algorithm

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

iMAP: discovering complex semantic matches between database schemas

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Introduction to the special issue on semantic integration

ACM SIGMOD Record
Mining structures for semantics

ACM SIGKDD Explorations Newsletter
Corpus-Based Schema Matching

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Aligning database columns using mutual information

dg.o '05 Proceedings of the 2005 national conference on Digital government research
Relational data mapping in MIQIS

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Approximate matching of textual domain attributes for information source integration

Proceedings of the 2nd international workshop on Information quality in information systems
Making holistic schema matching robust: an ensemble approach

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Light-weight domain-based form assistant: querying web databases on the fly

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Tuning schema matching software using synthetic scenarios

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Mapping maintenance for data integration systems

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Semantic-integration research in the database community

AI Magazine - Special issue on semantic integration
Establishing value mappings using statistical models and user feedback

Proceedings of the 14th ACM international conference on Information and knowledge management
Profile-Based Object Matching for Information Integration

IEEE Intelligent Systems
XML Mapping technology: making connections in an XML-centric world

IBM Systems Journal
Integration of XML schemas at various "severity" levels

Information Systems
Dealing with semantic heterogeneity for improving web usage

Data & Knowledge Engineering - Special issue: ER 2004
Data integration: the teenage years

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Putting context into schema matching

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
eTuner: tuning schema matching software using synthetic scenarios

The VLDB Journal — The International Journal on Very Large Data Bases
Using Bayesian decision for ontology mapping

Web Semantics: Science, Services and Agents on the World Wide Web
QMatch - Using paths to match XML schemas

Data & Knowledge Engineering
Information retrieval and machine learning for probabilistic schema matching

Information Processing and Management: an International Journal
A composite approach to automating direct and indirect schema mappings

Information Systems
Matching large schemas: Approaches and evaluation

Information Systems
Exploring Attribute Correspondences Across Heterogeneous Databases by Mutual Information

Journal of Management Information Systems
Automaton segmentation: a new approach to preserve privacy in xml information brokering

Proceedings of the 14th ACM conference on Computer and communications security
An Exploratory Study of Database Integration Processes

IEEE Transactions on Knowledge and Data Engineering
Schema mapping verification: the spicy way

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
SeMap: a generic mapping construction system

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Towards a global schema for web entities

Proceedings of the 17th international conference on World Wide Web
Bootstrapping pay-as-you-go data integration systems

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Applications of corpus-based semantic similarity and word segmentation to database schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Schema matching on streams with accuracy guarantees

Intelligent Data Analysis - Knowledge Discovery from Data Streams
Analyzing and revising data integration schemas to improve their matchability

Proceedings of the VLDB Endowment
Integrating web query results: holistic schema matching

Proceedings of the 17th ACM conference on Information and knowledge management
Theories of meaning in schema matching: An exploratory study

Information Systems
Reconciliando dados de cunho acadêmico

SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
A large dataset for the evaluation of ontology matching

The Knowledge Engineering Review
An instance-based approach for domain-independent schema matching

Proceedings of the 46th Annual Southeast Regional Conference on XX
Information theory for data management

Proceedings of the VLDB Endowment
Privacy-preserving schema matching using mutual information

Proceedings of the 21st annual IFIP WG 11.3 working conference on Data and applications security
Schema mapping in p2p networks based on classification and probing

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Semantic matching: algorithms and implementation

Journal on data semantics IX
Parsing query interfaces of deep web: from specialization to generalization

IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
Information theory for data management

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
On multi-column foreign key discovery

Proceedings of the VLDB Endowment
Contextual factors in database integration: a Delphi study

ER'10 Proceedings of the 29th international conference on Conceptual modeling
A self-training approach for resolving object coreference on the semantic web

Proceedings of the 20th international conference on World wide web
Synthesizing products for online catalogs

Proceedings of the VLDB Endowment
Automatic discovery of attributes in relational databases

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Semantic schema matching

OTM'05 Proceedings of the 2005 Confederated international conference on On the Move to Meaningful Internet Systems - Volume >Part I
Clustering-based schema matching of web data for constructing digital library

ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part II
Identifying value mappings for data integration: an unsupervised approach

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
Data mapping as search

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
A survey of schema-based matching approaches

Journal on Data Semantics IV
Heuristic strategies for the discovery of inclusion dependencies and other patterns

Journal on Data Semantics V
sPLMap: a probabilistic approach to schema matching

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
A matching algorithm for electronic data interchange

TES'05 Proceedings of the 6th international conference on Technologies for E-Services
Sample-driven schema mapping

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Appearance-Order-Based schema matching

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Instance-Based matching of large ontologies using locality-sensitive hashing

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Schema matching and embedded value mapping for databases with opaque column names and mixed continuous and discrete-valued data fields

ACM Transactions on Database Systems (TODS)
Matching Attributes across Overlapping Heterogeneous Data Sources Using Mutual Information

Journal of Database Management
Extraction and integration of partially overlapping web sources

Proceedings of the VLDB Endowment
Discovering linkage points over web data

Proceedings of the VLDB Endowment
Schema matching prediction with applications to data source discovery and dynamic ensembling

The VLDB Journal — The International Journal on Very Large Data Bases
'Big Data' collaboration: Exploring, recording and sharing enterprise knowledge

Information Services and Use - Enabling and Mapping Scientific Collaboration

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most previous solutions to the schema matching problem rely in some fashion upon identifying "similar" column names in the schemas to be matched, or by recognizing common domains in the data stored in the schemas. While each of these approaches is valuable in many cases, they are not infallible, and there exist instances of the schema matching problem for which they do not even apply. Such problem instances typically arise when the column names in the schemas and the data in the columns are "opaque" or very difficult to interpret. In this paper we propose a two-step technique that works even in the presence of opaque column names and data values. In the first step, we measure the pair-wise attribute correlations in the tables to be matched and construct a dependency graph using mutual information as a measure of the dependency between attributes. In the second stage, we find matching node pairs in the dependency graphs by running a graph matching algorithm. We validate our approach with an experimental study, the results of which suggest that such an approach can be a useful addition to a set of (semi) automatic schema matching techniques.