Incremental validation of XML documents

  • Authors:
  • Andrey Balmin;Yannis Papakonstantinou;Victor Vianu

  • Affiliations:
  • IBM Almaden Research Center, San Jose, CA;University of California at San Diego, La Jolla, CA;University of California at San Diego, La Jolla, CA

  • Venue:
  • ACM Transactions on Database Systems (TODS)
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate the incremental validation of XML documents with respect to DTDs, specialized DTDs, and XML Schemas, under updates consisting of element tag renamings, insertions, and deletions. DTDs are modeled as extended context-free grammars. "Specialized DTDs" allow the decoupling of element types from element tags. XML Schemas are abstracted as specialized DTDs with limitations on the type assignment. For DTDs and XML Schemas, we exhibit an O(m log n) incremental validation algorithm using an auxiliary structure of size O(n), where n is the size of the document and m the number of updates. The algorithm does not handle the incremental validation of XML Schema wrt renaming of internal nodes, which is handled by the specialized DTDs incremental validation algorithm. For specialized DTDs, we provide an O(m log2 n) incremental algorithm, again using an auxiliary structure of size O(n). This is a significant improvement over brute-force re-validation from scratch.We exhibit a restricted class of DTDs called local that arise commonly in practice and for which incremental validation can be done in practically constant time by maintaining only a list of counters. We present implementations of both general incremental validation and local validation on an XML database built on top of a relational database.Our experimentation includes a study of the applicability of local validation in practice, results on the calibration of parameters of the auxiliary data structure, and results on the performance comparison between the general incremental validation technique, the local validation technique, and brute-force validation from scratch.