On multi-column foreign key discovery

Authors:
Meihui Zhang;Marios Hadjieleftheriou;Beng Chin Ooi;Cecilia M. Procopiuc;Divesh Srivastava
Affiliations:
National University of Singapore;AT&T Labs - Research;National University of Singapore;AT&T Labs - Research;AT&T Labs - Research
Venue:
Proceedings of the VLDB Endowment
Year:
2010

Citing 15
Cited 5

A Unified Approach to the Change of Resolution: Space and Gray-Level

IEEE Transactions on Pattern Analysis and Machine Intelligence
Space-efficient online computation of quantile summaries

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Mining database structure; or, how to build a data quality browser

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Introduction to Algorithms

Introduction to Algorithms
Inclusion dependencies and their interaction with functional dependencies

PODS '82 Proceedings of the 1st ACM SIGACT-SIGMOD symposium on Principles of database systems
Discovering interesting inclusion dependencies: application to logical database tuning

Information Systems - Databases: Creation, management and utilization
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
On schema matching with opaque column names and data values

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Zigzag: a new algorithm for mining large inclusion dependencies in databases

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
GORDIAN: efficient and scalable discovery of composite keys

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
On synopses for distinct-value estimation under multiset operations

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Detecting change in data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Unary and n-ary inclusion dependency discovery in relational databases

Journal of Intelligent Information Systems
Robust approximate aggregation in sensor data management systems

ACM Transactions on Database Systems (TODS)
Leveraging discarded samples for tighter estimation of multiple-set aggregates

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems

Schema extraction

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Automatic discovery of attributes in relational databases

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Efficient filtering and ranking schemes for finding inclusion dependencies on the web

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Armstrong databases: validation, communication and consolidation of conceptual models with perfect test data

APCCM '12 Proceedings of the Eighth Asia-Pacific Conference on Conceptual Modelling - Volume 130
Discovering linkage points over web data

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

A foreign/primary key relationship between relational tables is one of the most important constraints in a database. From a data analysis perspective, discovering foreign keys is a crucial step in understanding and working with the data. Nevertheless, more often than not, foreign key constraints are not specified in the data, for various reasons; e.g., some associations are not known to designers but are inherent in the data, while others become invalid due to data inconsistencies. This work proposes a robust algorithm for discovering single-column and multi-column foreign keys. Previous work concentrated mostly on discovering single-column foreign keys using a variety of rules, like inclusion dependencies, column names, and minimum/maximum values. We first propose a general rule, termed Randomness, that subsumes a variety of other rules. We then develop efficient approximation algorithms for evaluating randomness, using only two passes over the data. Finally, we validate our approach via extensive experiments using real and synthetic datasets.