Schema equivalence in heterogeneous systems: bridging theory and practice
Information Systems - Special issue on extending database technology
An automatic technique for detecting type conflicts in database schemes
Proceedings of the seventh international conference on Information and knowledge management
Data & Knowledge Engineering
Data-driven understanding and refinement of schema mappings
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms
Data Mining and Knowledge Discovery
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
Semantic Integration in Heterogeneous Databases Using Neural Networks
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
On schema matching with opaque column names and data values
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Attribute Classification Using Feature Analysis
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Finding the most interesting patterns in a database quickly by using sequential sampling
The Journal of Machine Learning Research
iMAP: discovering complex semantic matches between database schemas
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Fast discovery of unexpected patterns in data, relative to a Bayesian network
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Tuning schema matching software using synthetic scenarios
VLDB '05 Proceedings of the 31st international conference on Very large data bases
COMA: a system for flexible combination of schema matching approaches
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Hi-index | 0.00 |
We address the problem of matching imperfectly documented schemas of data streams and large databases. Instance-level schema matching algorithms identify likely correspondences between attributes by quantifying the similarity of their corresponding values. However, exact calculation of these similarities requires processing of all database records - which is infeasible for data streams. We devise a fast matching algorithm that uses only a small sample of records, and is yet guaranteed to find a matching that is a close approximation of the matching that would be obtained if the entire stream were processed. The method can be applied to any given (combination of) similarity metrics that can be estimated from a sample with bounded error; we apply the algorithm to several metrics. We give a rigorous proof of the method's correctness and report on experiments using large databases.