An information-theoretic approach to normal forms for relational and XML data

  • Authors:
  • Marcelo Arenas;Leonid Libkin

  • Affiliations:
  • University of Toronto, Toronto, Ontario, Canada;University of Toronto, Toronto, Ontario, Canada

  • Venue:
  • Journal of the ACM (JACM)
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Normalization as a way of producing good relational database designs is a well-understood topic. However, the same problem of distinguishing well-designed databases from poorly designed ones arises in other data models, in particular, XML. While, in the relational world, the criteria for being well designed are usually very intuitive and clear to state, they become more obscure when one moves to more complex data models.Our goal is to provide a set of tools for testing when a condition on a database design, specified by a normal form, corresponds to a good design. We use techniques of information theory, and define a measure of information content of elements in a database with respect to a set of constraints. We first test this measure in the relational context, providing information-theoretic justification for familiar normal forms such as BCNF, 4NF, PJ/NF, 5NFR, DK/NF. We then show that the same measure applies in the XML context, which gives us a characterization of a recently introduced XML normal form called XNF. Finally, we look at information-theoretic criteria for justifying normalization algorithms.