Frontiers of tractability for typechecking simple XML transformations
Journal of Computer and System Sciences
Typechecking top-down XML transformations: Fixed input or output schemas
Information and Computation
Foundations of regular expressions in XML schema languages and SPARQL
PhD '12 Proceedings of the on SIGMOD/PODS 2012 PhD Symposium
Transformations Between Different Models of Unranked Bottom-Up Tree Automata
Fundamenta Informaticae
Locality and the complexity of minimalist derivation tree languages
FG'10/FG'11 Proceedings of the 15th and 16th international conference on Formal Grammar
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
Hi-index | 0.00 |
XML (eXtensible Markup Language) has currently evolved to the standard data exchange format for the World Wide Web. Its main advantages are that it offers an intuitive and standard way of structuring a very wide range of data and that it admits the use of user-defined tags. The latter allows user communities to develop their own format of XML documents, which is defined by an XML schema. The presence of such a schema improves the efficiency of many tasks like, for instance, query processing, query optimization, and automatic data integration. The dissertation is divided into two parts. The first part studies the typechecking problem for XML to XML transformations. The typechecking problem asks, given an input schema, an output schema, and a transformation, whether the output of the transformation is always conform to the output schema when its input is in the input schema. We focus on identifying practical and tractable fragments of the latter problem. In particular, we exhibit a large tractable class in which deletion in transformations is allowed, but the number of copies they make of certain parts in the input tree is bounded. The second part studies the expressive power and the complexity of basic decision problems for XML schema languages. We discuss several syntactical and semantical characterizations of the Element Declarations Consistent (EDC) constraint of W3C XML Schema. We argue that cleaner, more expressive, more robust but equally feasible schema languages can be obtained by replacing EDC with the notion of 1-Pass Preorder Typing (1PPT) or Top-Down Typing (TDT). The former notion essentially allows schemas to determine the type of an element of a streaming document when its opening tag is met and the latter allows to determine the type of an element when it is met when reading the DOM tree in a top-down fashion. In terms of expressive power, EDC, 1PPT, and TDT are strictly included from left to right. We further consider problems such as inclusion, equivalence, intersection-non-emptiness, and minimization of such schemas. Surprisingly, the complexity of these decision problems is essentially the same for schemas with the EDC constraint as for its more expressive 1PPT and TDT variants. Finally, we discuss the problem of minimizing schema languages with the expressive power of unranked regular tree languages.