Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Software—Practice & Experience
SIAM Journal on Computing
From regular expressions to deterministic automata
Theoretical Computer Science
Efficient text searching
Programming perl
An algorithm for string matching with a sequence of don't cares
Information Processing Letters
Algorithms for finding patterns in strings
Handbook of theoretical computer science (vol. A)
A new approach to text searching
Communications of the ACM
Fast text searching: allowing errors
Communications of the ACM
Overview of the first TREC conference
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Tcl and the Tk toolkit
Lightweight source model extraction
SIGSOFT '95 Proceedings of the 3rd ACM SIGSOFT symposium on Foundations of software engineering
Derivatives of Regular Expressions
Journal of the ACM (JACM)
A fast string searching algorithm
Communications of the ACM
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Programming Techniques: Regular expression search algorithm
Communications of the ACM
Automatic generation of efficient lexical processors using finite state techniques
Communications of the ACM
Author's Guide to the Standard Generalized Markup Language
Author's Guide to the Standard Generalized Markup Language
Introduction To Automata Theory, Languages, And Computation
Introduction To Automata Theory, Languages, And Computation
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
From Regular Expressions to DFA's Using Compressed NFA's
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
A model independent source code repository
CASCON '99 Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative research
A simple way to construct NFA with fewer states and transitions
ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
Prefix-free regular languages and pattern matching
Theoretical Computer Science
Outfix-Free Regular Languages and Prime Outfix-Free Decomposition
Fundamenta Informaticae
A text pattern-matching tool based on Parsing Expression Grammars
Software—Practice & Experience
Automated time study of skidders using global positioning system data
Computers and Electronics in Agriculture
Overlap-Free regular languages
COCOON'06 Proceedings of the 12th annual international conference on Computing and Combinatorics
Regular languages with variables on graphs
Information and Computation
Prefix-Free regular-expression matching
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
A new linearizing restriction in the pattern matching problem
FCT'05 Proceedings of the 15th international conference on Fundamentals of Computation Theory
Outfix-free regular languages and prime outfix-free decomposition
ICTAC'05 Proceedings of the Second international conference on Theoretical Aspects of Computing
Outfix-Free Regular Languages and Prime Outfix-Free Decomposition
Fundamenta Informaticae
Hi-index | 0.00 |
The use of regular expressions for text search is widely known and well understood. It is then surprising that the standard techniques and tools prove to be of limited use for searching structured text formatted with SGML or similar markup languages. Our experience with structured text search has caused us to reexamine the current practice. The generally accepted rule of “leftmost longest match” is an unfortunate choice and is at the root of the difficulties. We instead propose a rule which is semantically cleaner. This rule is generally applicable to a variety of text search applications, including source code analysis, and has interesting properties in its own right. We have written a publicly available search tool implementing the theory in the article, which has proved valuable in a variety of circumstances.