Advancing the discovery of unique column combinations

Authors:
Ziawasch Abedjan;Felix Naumann
Affiliations:
Hasso-Plattner-Institut, Potsdam, Germany;Hasso-Plattner-Institut, Potsdam, Germany
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 4
Cited 2

Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
SQL-based discovery of exact and approximate functional dependencies

Working group reports from ITiCSE on Innovation and technology in computer science education
GORDIAN: efficient and scalable discovery of composite keys

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Database dependency discovery: a machine learning approach

AI Communications

The data analytics group at the qatar computing research institute

ACM SIGMOD Record
Data profiling revisited

ACM SIGMOD Record

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unique column combinations of a relational database table are sets of columns that contain only unique values. Discovering such combinations is a fundamental research problem and has many different data management and knowledge discovery applications. Existing discovery algorithms are either brute force or have a high memory load and can thus be applied only to small datasets or samples. In this paper, the well-known Gordian algorithm [9] and "Apriori-based" algorithms [4] are compared and analyzed for further optimization. We greatly improve the Apriori algorithms through efficient candidate generation and statistics-based pruning methods. A hybrid solution HCA-Gordian combines the advantages of Gordian and our new algorithm HCA, and it outperforms all previous work in many situations.