Efficient discovery of join plans in schemaless data

Authors:
Aybar C. Acar;Amihai Motro
Affiliations:
Bilkent University, Ankara, Turkey;George Mason University, Fairfax, VA
Venue:
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Year:
2009

Citing 21
Cited 0

Resolving the query inference problem using Steiner trees

ACM Transactions on Database Systems (TODS)
A Theory of Attributed Equivalence in Databases with Application to Schema Integration

IEEE Transactions on Software Engineering
Algorithms for Enumerating All Spanning Trees ofUndirected and Weighted Graphs

SIAM Journal on Computing
Semistructured data

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On the foundations of the universal relation model

ACM Transactions on Database Systems (TODS)
Maximal objects and the semantics of universal relation databases

ACM Transactions on Database Systems (TODS)
SEMINT: a tool for identifying attribute correspondences in heterogeneous databases using neural networks

Data & Knowledge Engineering
What happened when database researchers met usability

Information Systems
Reconciling schemas of disparate data sources: a machine-learning approach

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
DBXplorer: enabling keyword search over relational databases

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
FLEX: A Tolerant and Cooperative User Interface to Databases

IEEE Transactions on Knowledge and Data Engineering
An incremental interactive algorithm for grammar inference

ICG! '96 Proceedings of the 3rd International Colloquium on Grammatical Inference: Learning Syntax from Sentences
Database Schema Matching Using Machine Learning with Feature Selection

CAiSE '02 Proceedings of the 14th International Conference on Advanced Information Systems Engineering
A survey of approaches to automatic schema matching

The VLDB Journal — The International Journal on Very Large Data Bases
Discovering all most specific sentences

ACM Transactions on Database Systems (TODS)
A Schema Analysis and Reconciliation Tool Environment for Heterogeneous Databases

IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
Table extraction using conditional random fields

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
CORDS: automatic discovery of correlations and soft functional dependencies

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
INFER: a relational query language without the complexity of SQL

Proceedings of the 14th ACM international conference on Information and knowledge management
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
BHUNT: automatic discovery of Fuzzy algebraic constraints in relational data

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a method of inferring join plans for a set of relation instances, in the absence of any metadata, such as attribute domains, attribute names, or constraints (e.g., keys or foreign keys). Our method enumerates the possible join plans in order of likelihood, based on the compatibility of a pair of columns and their suitability as join attributes (i.e. their appropriateness as keys). We outline two variants of the approach. The first variant is accurate but potentially time-consuming, especially for large relations that do not fit in memory. The second variant is an approximation of the former and hence less accurate, but is considerably more efficient, allowing the method to be used online, even for large relations. We provide experimental results showing how both forms scale in terms of performance as the number of candidate join attributes and the size of the relations increase. We also characterize the accuracy of the approximate variant with respect to the exact variant.