Design principles for xml data

  • Authors:
  • Marcelo Arenas

  • Affiliations:
  • University of Toronto (Canada)

  • Venue:
  • Design principles for xml data
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this dissertation, we take a first step towards the design and normalization theory for XML documents. We start by noticing that while in the relational world the criteria for being well designed are very intuitive, they become more obscure when one moves to XML. Thus, our first contribution is to provide a tool for testing when a condition on a database design, specified as a normal form, corresponds to a good design. We use techniques of information theory, and define a measure of information content of elements in a database with respect to a set of constraints. This measure can be used in different data models, in particular, we use it in the relational model to provide information-theoretic justification for well-known normal forms and for normalization algorithms. As our second contribution we introduce languages for XML data dependencies, that will be used later as the source of semantic information in the design of XML databases. Since inconsistent XML specifications may arise in practice because of the interaction be tween these dependencies and the constraint imposed by XML schemas (DTDs), our next contribution is to pinpoint the complexity of checking consistency of XML specifications. We then show that XML documents may contain redundant information, and may be prone to update anomalies. Thus, our final contribution is to define an XML normal form, XNF, that avoids update anomalies and redundancies. We study its properties, and show that it generalizes BCNF and that it can be justified by our information-theoretic measure. We present an algorithm for converting any XML schema into an equivalent one in XNF, and we use our information-theoretic measure to justify this algorithm.