Advancing the discovery of unique column combinations

  • Authors:
  • Ziawasch Abedjan;Felix Naumann

  • Affiliations:
  • Hasso-Plattner-Institut, Potsdam, Germany;Hasso-Plattner-Institut, Potsdam, Germany

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Unique column combinations of a relational database table are sets of columns that contain only unique values. Discovering such combinations is a fundamental research problem and has many different data management and knowledge discovery applications. Existing discovery algorithms are either brute force or have a high memory load and can thus be applied only to small datasets or samples. In this paper, the well-known Gordian algorithm [9] and "Apriori-based" algorithms [4] are compared and analyzed for further optimization. We greatly improve the Apriori algorithms through efficient candidate generation and statistics-based pruning methods. A hybrid solution HCA-Gordian combines the advantages of Gordian and our new algorithm HCA, and it outperforms all previous work in many situations.