Generating, sampling and counting subclasses of regular tree languages

Authors:
Timos Antonopoulos;Floris Geerts;Wim Martens;Frank Neven
Affiliations:
Hasselt University and Transnational University of Limburg;University of Edinburgh;Technische Universität Dortmund;Hasselt University and Transnational University of Limburg
Venue:
Proceedings of the 14th International Conference on Database Theory
Year:
2011

Citing 29
Cited 3

Deciding equivalence of finite tree automata

SIAM Journal on Computing
The complexity of computing the number of strings of given length in context-free languages

Theoretical Computer Science
Generating words in a context-free language uniformly at random

Information Processing Letters
A calculus for the random generation of labelled combinatorial structures

Theoretical Computer Science
Random generation of words in an algebraic language in linear binary space

Information Processing Letters
A quasi-polynomial-time algorithm for sampling words from a context-free language

Information and Computation
One-unambiguous regular languages

Information and Computation
Counting and random generation of strings in regular languages

Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
Normal form algorithms for extended context-free grammars

Theoretical Computer Science
ToXgene: a template-based data generator for XML

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Regular Expressions into Finite Automata

LATIN '92 Proceedings of the 1st Latin American Symposium on Theoretical Informatics
Taxonomy of XML schema languages using formal language theory

ACM Transactions on Internet Technology (TOIT)
Expressiveness and complexity of XML Schema

ACM Transactions on Database Systems (TODS)
On the minimization of XML Schemas and tree automata for unranked trees

Journal of Computer and System Sciences
Enumeration and random generation of accessible automata

Theoretical Computer Science
Enumeration and generation with a string automata representation

Theoretical Computer Science
Simple off the shelf abstractions for XML schema

ACM SIGMOD Record
Inferring XML schema definitions from XML data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Learning deterministic regular expressions for the inference of schemas from XML data

Proceedings of the 17th international conference on World Wide Web
The Tractability Frontier for NFA Minimization

ICALP '08 Proceedings of the 35th international colloquium on Automata, Languages and Programming, Part II
Introduction to Automata Theory, Languages, and Computation

Introduction to Automata Theory, Languages, and Computation
Running tree automata on probabilistic XML

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Simplifying XML schema: effortless handling of nondeterministic regular expressions

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Incorporating constraints in probabilistic XML

ACM Transactions on Database Systems (TODS)
Random Generation of Deterministic Tree (Walking) Automata

CIAA '09 Proceedings of the 14th International Conference on Implementation and Application of Automata
Inference of concise regular expressions and DTDs

ACM Transactions on Database Systems (TODS)
Simplifying XML schema: single-type approximations of regular tree languages

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Complexity of Decision Problems for XML Schemas and Chain Regular Expressions

SIAM Journal on Computing
Enumerating regular expressions and their languages

CIAA'04 Proceedings of the 9th international conference on Implementation and Application of Automata

Auto-completion learning for XML

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Finding optimal probabilistic generators for XML collections

Proceedings of the 15th International Conference on Database Theory
Bounded repairability for regular tree languages

Proceedings of the 15th International Conference on Database Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

To experimentally validate learning and approximation algorithms for XML Schema Definitions (XSDs), we need algorithms to generate uniformly at random a corpus of XSDs as well as a similarity measure to compare how close the generated XSD resembles the target schema. In this paper, we provide the formal foundation for such a testbed. We adopt similarity measures based on counting the number of common and different trees in the two languages, and we develop the necessary machinery for computing them. We use the formalism of extended DTDs (EDTDs) to represent the unranked regular tree languages. In particular, we obtain an efficient algorithm to count the number of trees up to a certain size in an unambiguous EDTD. The latter class of unambiguous EDTDs encompasses the more familiar classes of single-type, restrained competition and bottom-up deterministic EDTDs. The single-type EDTDs correspond precisely to the core of XML Schema, while the others are strictly more expressive. We also show how constraints on the shape of allowed trees can be incorporated. As we make use of a translation into a well-known formalism for combinatorial specifications, we get for free a sampling procedure to draw members of any unambiguous EDTD. When dropping the restriction to unambiguous EDTDs, i.e. taking the full class of EDTDs into account, we show that the counting problem becomes #P-complete and provide an approximation algorithm. Finally, we discuss uniform generation of single-type EDTDs, i.e., the formal abstraction of XSDs. To this end, we provide an algorithm to generate k-occurrence automata (k-OAs) uniformly at random and show how this leads to uniform generation of single-type EDTDs.