Static analysis of xml transformation and schema languages

  • Authors:
  • Frank Neven;Wim Martens

  • Affiliations:
  • Universiteit Antwerpen (Belgium);Universiteit Antwerpen (Belgium)

  • Venue:
  • Static analysis of xml transformation and schema languages
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

XML (eXtensible Markup Language) has currently evolved to the standard data exchange format for the World Wide Web. Its main advantages are that it offers an intuitive and standard way of structuring a very wide range of data and that it admits the use of user-defined tags. The latter allows user communities to develop their own format of XML documents, which is defined by an XML schema. The presence of such a schema improves the efficiency of many tasks like, for instance, query processing, query optimization, and automatic data integration. The dissertation is divided into two parts. The first part studies the typechecking problem for XML to XML transformations. The typechecking problem asks, given an input schema, an output schema, and a transformation, whether the output of the transformation is always conform to the output schema when its input is in the input schema. We focus on identifying practical and tractable fragments of the latter problem. In particular, we exhibit a large tractable class in which deletion in transformations is allowed, but the number of copies they make of certain parts in the input tree is bounded. The second part studies the expressive power and the complexity of basic decision problems for XML schema languages. We discuss several syntactical and semantical characterizations of the Element Declarations Consistent (EDC) constraint of W3C XML Schema. We argue that cleaner, more expressive, more robust but equally feasible schema languages can be obtained by replacing EDC with the notion of 1-Pass Preorder Typing (1PPT) or Top-Down Typing (TDT). The former notion essentially allows schemas to determine the type of an element of a streaming document when its opening tag is met and the latter allows to determine the type of an element when it is met when reading the DOM tree in a top-down fashion. In terms of expressive power, EDC, 1PPT, and TDT are strictly included from left to right. We further consider problems such as inclusion, equivalence, intersection-non-emptiness, and minimization of such schemas. Surprisingly, the complexity of these decision problems is essentially the same for schemas with the EDC constraint as for its more expressive 1PPT and TDT variants. Finally, we discuss the problem of minimizing schema languages with the expressive power of unranked regular tree languages.