Running tree automata on probabilistic XML

  • Authors:
  • Sara Cohen;Benny Kimelfeld;Yehoshua Sagiv

  • Affiliations:
  • The Hebrew University, Jerusalem, Israel;IBM, San Jose, CA, USA;The Hebrew University, Jerusalem, Israel

  • Venue:
  • Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Tree automata (specifically, bottom-up and unranked) form a powerful tool for querying and maintaining validity of XML documents. XML with uncertain data can be modeled as a probability space of labeled trees, and that space is often represented by a tree with distributional nodes. This paper investigates the problem of evaluating a tree automaton over such a representation, where the goal is to compute the probability that the automaton accepts a random possible world. This problem is generally intractable, but for the case where the tree automaton is deterministic (and its transitions are defined by deterministic string automata), an efficient algorithm is presented. The paper discusses the applications of this result, including the ability to sample and to evaluate queries (e.g., in monadic second-order logic) while requiring a-priori conformance to a schema (e.g., DTD). XML schemas also include attribute constraints, and the complexity of key, foreign-key and inclusion constraints are studied in the context of probabilistic XML. Finally, the paper discusses the generalization of the results to an extended data model, where distributional nodes can repeatedly sample the same subtree, thereby adding another exponent to the size of the probability space.