Efficient asymmetric inclusion between regular expression types

Authors:
Dario Colazzo;Giorgio Ghelli;Carlo Sartiani
Affiliations:
Université Paris Sud, Orsay;Università di Pisa, Pisa - Italy;Università di Pisa, Pisa - Italy
Venue:
Proceedings of the 12th International Conference on Database Theory
Year:
2009

Citing 8
Cited 10

The complexity of word problems—this time with interleaving

Information and Computation
The LCA Problem Revisited

LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Inference of concise DTDs from XML data

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Expressiveness and complexity of XML Schema

ACM Transactions on Database Systems (TODS)
Linear time membership in a class of regular expressions with interleaving and counting

Proceedings of the 17th ACM conference on Information and knowledge management
Efficient inclusion for a class of XML types with interleaving and counting

DBPL'07 Proceedings of the 11th international conference on Database programming languages
Optimizing schema languages for XML: numerical constraints and interleaving

ICDT'07 Proceedings of the 11th international conference on Database Theory
Efficient incremental validation of XML documents after composite updates

XSym'06 Proceedings of the 4th international conference on Database and XML Technologies

Linear inclusion for XML regular expression types

Proceedings of the 18th ACM conference on Information and knowledge management
Subtyping algorithm of regular tree grammars with disjoint production rules

ICTAC'10 Proceedings of the 7th International colloquium conference on Theoretical aspects of computing
Precision and complexity of XQuery type inference

Proceedings of the 13th international ACM SIGPLAN symposium on Principles and practices of declarative programming
Weak inclusion for XML types

CIAA'11 Proceedings of the 16th international conference on Implementation and application of automata
The complexity of evaluating path expressions in SPARQL

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Regular Expressions with Counting: Weak versus Strong Determinism

SIAM Journal on Computing
The inclusion problem for regular expressions

Journal of Computer and System Sciences
Weak inclusion for recursive XML types

CIAA'12 Proceedings of the 17th international conference on Implementation and Application of Automata
The complexity of regular expressions and property paths in SPARQL

ACM Transactions on Database Systems (TODS) - Invited papers issue
Almost-linear inclusion for XML regular expression types

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The inclusion of Regular Expressions (REs) is the kernel of any subtype checking algorithm for XML schema languages. XML applications would benefit from the extension of REs with interleaving and counting, but this is not feasible in general, since inclusion is EXPSPACE-complete for such extended REs. In [9] we introduced a notion of "conflict-free REs", which are extended REs with excellent complexity behaviour, including a cubic inclusion algorithm [9] and linear membership [10]. Conflict-free REs have interleaving and counting, but the complexity is tamed by the "conflict-free" limitations, which have been found to be satisfied by the vast majority of the content models published on the Web. However, the most important use of subtype checking is in the context of type-cheching of XML manipulation languges. A type checker works by testing the inclusion of inferred subtypes in declared supertypes. The conflict-free restriction, while quite harmless for the human-defined supertype, is far too restrictive for the inferred subtype, whose shape is difficult to constrain. We show here that the PTIME inclusion algorithm can be actually extended to deal with totally unrestricted REs with counting and interleaving in the subtype position, provided that the supertype is conflict-free. This is exactly the expressive power that we need in order to use subtyping inside type-checking algorithms, and the cost of this generalized algorithm is only quadratic, which is as good as the best algorithm we have for the symmetric case (see [5]). The result is extremely surprising, since we had previously found that asymmetric inclusion becomes NP-hard as soon as the candidate subtype is enriched with binary intersection, a generalization that looked much more innocent than what we achieve here.