On redundancy vs dependency preservation in normalization: an information-theoretic study of 3NF

  • Authors:
  • Solmaz Kolahi;Leonid Libkin

  • Affiliations:
  • University of Toronto;University of Toronto and University of Edinburgh

  • Venue:
  • Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

A recently introduced information-theoretic approach to analyzing redundancies in database design was used to justify normal forms like BCNF that completely eliminate redundancies. The main notion is that of an information content of each datum in an instance (which is a number in [0,1]): the closer to 1, the less redundancy it carries. In practice, however, one usually settles for 3NF which, unlike BCNF, may not eliminate all redundancies but always guarantees dependency preservation.In this paper we use the information-theoretic approach to prove that 3NF is the best normal form if one needs to achieve dependency preservation. For each dependency-preserving normal form, we define the price of dependency preservation as an information-theoretic measure of redundancy that gets introduced to compensate for dependency preservation. This is a number in the [0,1] range: the smaller it is, the less redundancy a normal form guarantees. We prove that for every dependency-preserving normal form, the price of dependency preservation is at least 1/2, and it is precisely 1/2 for 3NF. Hence, 3NF has the least amount of redundancy among all dependency-preserving normal forms. We also show that, information-theoretically, unnormalized schemas have at least twice the amount of redundancy than schemas in 3NF.