Normalization theory for XML

  • Authors:
  • Leonid Libkin

  • Affiliations:
  • School of Informatics, University of Edinburgh

  • Venue:
  • XSym'07 Proceedings of the 5th international conference on Database and XML Technologies
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Specifications of XML documents typically consist of typing information (e.g., a DTD), and integrity constraints. Just like relational schema specifications, not all are good - some are prone to redundancies and update anomalies. In the relational world we have a well-developed theory of data design (also known as normalization). A few definitions of XML normal forms have been proposed, but the main question is why a particular design is good. In the XML world, we still lack universally accepted query languages such as relational algebra, or update languages that let us reason about storage redundancies, lossless decompositions, and update anomalies. A better approach, therefore, is to come up with notions of good design based on the intrinsic properties of the model itself. We present such an approach, based on Shannon's information theory, and show how it applies to relational normal forms as well as to XML design, for both native and relational storage.