Schema matching on streams with accuracy guarantees

Authors:
Szymon Jaroszewicz;Lenka Ivantysynova;Tobias Scheffer
Affiliations:
National Institute of Telecommunications, Warsaw, Poland. E-mail: s.jaroszewicz@itl.waw.pl;Humboldt-Universität zu Berlin, Berlin, Germany. E-mail: lenka@wiwi.hu-berlin.de;Max Planck Institute for Computer Science, Saarbrücken, Germany. E-mail: scheffer@mpi-inf.mpg.de
Venue:
Intelligent Data Analysis - Knowledge Discovery from Data Streams
Year:
2008

Citing 16
Cited 0

Schema equivalence in heterogeneous systems: bridging theory and practice

Information Systems - Special issue on extending database technology
An automatic technique for detecting type conflicts in database schemes

Proceedings of the seventh international conference on Information and knowledge management
SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks

Data & Knowledge Engineering
Data-driven understanding and refinement of schema mappings

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms

Data Mining and Knowledge Discovery
Generic Schema Matching with Cupid

Proceedings of the 27th International Conference on Very Large Data Bases
Semantic Integration in Heterogeneous Databases Using Neural Networks

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
On schema matching with opaque column names and data values

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Attribute Classification Using Feature Analysis

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Finding the most interesting patterns in a database quickly by using sequential sampling

The Journal of Machine Learning Research
iMAP: discovering complex semantic matches between database schemas

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Fast discovery of unexpected patterns in data, relative to a Bayesian network

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Tuning schema matching software using synthetic scenarios

VLDB '05 Proceedings of the 31st international conference on Very large data bases
COMA: a system for flexible combination of schema matching approaches

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the problem of matching imperfectly documented schemas of data streams and large databases. Instance-level schema matching algorithms identify likely correspondences between attributes by quantifying the similarity of their corresponding values. However, exact calculation of these similarities requires processing of all database records - which is infeasible for data streams. We devise a fast matching algorithm that uses only a small sample of records, and is yet guaranteed to find a matching that is a close approximation of the matching that would be obtained if the entire stream were processed. The method can be applied to any given (combination of) similarity metrics that can be estimated from a sample with bounded error; we apply the algorithm to several metrics. We give a rigorous proof of the method's correctness and report on experiments using large databases.