Type inference for unique pattern matching

Authors:
Stijn Vansummeren
Affiliations:
Hasselt University and Transnational University of Limburg, Diepenbeek, Belgium
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
2006

Citing 20
Cited 6

Elements of ML programming (ML97 ed.)

Elements of ML programming (ML97 ed.)
Term rewriting and all that

Term rewriting and all that
A Web Odyssey: from Codd to XML

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Extended path expressions of XML

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The XML typechecking problem

ACM SIGMOD Record
Sed and AWK

Sed and AWK
Programming Perl

Programming Perl
Introduction to Automata Theory, Languages and Computability

Introduction to Automata Theory, Languages and Computability
Automata theory for XML researchers

ACM SIGMOD Record
Semantic Subtyping

LICS '02 Proceedings of the 17th Annual IEEE Symposium on Logic in Computer Science
Locating Matches of Tree Patterns in Forests

Proceedings of the 18th Conference on Foundations of Software Technology and Theoretical Computer Science
MONA 1.x: New Techniques for WS1S and WS2S

CAV '98 Proceedings of the 10th International Conference on Computer Aided Verification
UnQL: a query language and algebra for semistructured data based on structural recursion

The VLDB Journal — The International Journal on Very Large Data Bases
XDuce: A statically typed XML processing language

ACM Transactions on Internet Technology (TOIT)
NFAs with Tagged Transitions, their Conversion to Deterministic Automata and Application to Regular Expressions

SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
CDuce: an XML-centric general-purpose language

ICFP '03 Proceedings of the eighth ACM SIGPLAN international conference on Functional programming
Compiling regular patterns

ICFP '03 Proceedings of the eighth ACM SIGPLAN international conference on Functional programming
Regular expression pattern matching for XML

Journal of Functional Programming
Regular expression types for XML

ACM Transactions on Programming Languages and Systems (TOPLAS)
Automata- and logic-based pattern languages for tree-structured data

Proceedings of the 2nd international conference on Semantics in databases

PiDuce - A project for experimenting Web services technologies

Science of Computer Programming
Hedge Pattern Partial Derivative

CIAA '09 Proceedings of the 14th International Conference on Implementation and Application of Automata
Parametric polymorphism for XML

ACM Transactions on Programming Languages and Systems (TOPLAS)
Typed and unambiguous pattern matching on strings using regular expressions

Proceedings of the 12th international ACM SIGPLAN symposium on Principles and practice of declarative programming
Disambiguation in regular expression matching via position automata with augmented transitions

CIAA'10 Proceedings of the 15th international conference on Implementation and application of automata
Bit-coded regular expression parsing

LATA'11 Proceedings of the 5th international conference on Language and automata theory and applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Regular expression patterns provide a natural, declarative way to express constraints on semistructured data and to extract relevant information from it. Indeed, it is a core feature of the programming language Perl, surfaces in various UNIX tools such as sed and awk, and has recently been proposed in the context of the XML programming language XDuce. Since regular expressions can be ambiguous in general, different disambiguation policies have been proposed to get a unique matching strategy. We formally define the matching semantics under both (1) the POSIX, and (2) the first and longest match disambiguation strategies. We show that the generally accepted method of defining the longest match in terms of the first match and recursion does not conform to the natural notion of longest match. We continue by solving the type inference problem for both disambiguation strategies, which consists of calculating the set of all subparts of input values a subexpression can match under the given policy.