Running tree automata on probabilistic XML

Authors:
Sara Cohen;Benny Kimelfeld;Yehoshua Sagiv
Affiliations:
The Hebrew University, Jerusalem, Israel;IBM, San Jose, CA, USA;The Hebrew University, Jerusalem, Israel
Venue:
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2009

Citing 26
Cited 12

Counting classes are at least as hard as the polynomial-time hierarchy

SIAM Journal on Computing
Fixed-Parameter Tractability and Completeness I: Basic Results

SIAM Journal on Computing
The Complexity of Planar Counting Problems

SIAM Journal on Computing
The complexity of query reliability

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Expressive and efficient pattern languages for tree-structured data (extended abstract)

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Holistic twig joins: optimal XML pattern matching

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
On XML integrity constraints in the presence of DTDs

Journal of the ACM (JACM)
Query automata over finite trees

Theoretical Computer Science
Probabilistic Interval XML

ICDT '03 Proceedings of the 9th International Conference on Database Theory
Incremental Validation of XML Documents

ICDT '03 Proceedings of the 9th International Conference on Database Theory
The Complexity of First-Order and Monadic Second-Order Logic Revisited

LICS '02 Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science
The complexity of relational query languages (Extended Abstract)

STOC '82 Proceedings of the fourteenth annual ACM symposium on Theory of computing
A Probabilistic XML Approach to Data Integration

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
PEPX: a query-friendly probabilistic XML database

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Introduction to Automata Theory, Languages, and Computation (3rd Edition)

Introduction to Automata Theory, Languages, and Computation (3rd Edition)
On the minimization of XML Schemas and tree automata for unranked trees

Journal of Computer and System Sciences
On the complexity of managing probabilistic XML data

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The dichotomy of conjunctive queries on probabilistic structures

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
ProTDB: probabilistic data in XML

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Matching twigs in probabilistic XML

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Query efficiency in probabilistic XML models

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Approximating predicates and expressive queries on probabilistic databases

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Incorporating constraints in probabilistic XML

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Querying and updating probabilistic information in XML

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Parameterized Complexity

Parameterized Complexity

Incorporating constraints in probabilistic XML

ACM Transactions on Database Systems (TODS)
On the expressiveness of probabilistic XML models

The VLDB Journal — The International Journal on Very Large Data Bases
Query evaluation over probabilistic XML

The VLDB Journal — The International Journal on Very Large Data Bases
Aggregate queries for discrete and continuous probabilistic XML

Proceedings of the 13th International Conference on Database Theory
Transducing Markov sequences

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Probabilistic XML via Markov Chains

Proceedings of the VLDB Endowment
Generating, sampling and counting subclasses of regular tree languages

Proceedings of the 14th International Conference on Database Theory
Querying probabilistic business processes for sub-flows

Proceedings of the 14th International Conference on Database Theory
Value joins are expensive over (probabilistic) XML

Proceedings of the 4th International Workshop on Logic in Databases
Capturing continuous data and answering aggregate queries in probabilistic XML

ACM Transactions on Database Systems (TODS)
Retrieving keyworded subgraphs with graph ranking score

Expert Systems with Applications: An International Journal
On the connections between relational and XML probabilistic data models

BNCOD'13 Proceedings of the 29th British National conference on Big Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tree automata (specifically, bottom-up and unranked) form a powerful tool for querying and maintaining validity of XML documents. XML with uncertain data can be modeled as a probability space of labeled trees, and that space is often represented by a tree with distributional nodes. This paper investigates the problem of evaluating a tree automaton over such a representation, where the goal is to compute the probability that the automaton accepts a random possible world. This problem is generally intractable, but for the case where the tree automaton is deterministic (and its transitions are defined by deterministic string automata), an efficient algorithm is presented. The paper discusses the applications of this result, including the ability to sample and to evaluate queries (e.g., in monadic second-order logic) while requiring a-priori conformance to a schema (e.g., DTD). XML schemas also include attribute constraints, and the complexity of key, foreign-key and inclusion constraints are studied in the context of probabilistic XML. Finally, the paper discusses the generalization of the results to an extended data model, where distributional nodes can repeatedly sample the same subtree, thereby adding another exponent to the size of the probability space.