On the use of regular expressions for searching text

Authors:
Charles L. A. Clarke;Gordon V. Cormack
Affiliations:
Univ. of Waterloo, Waterloo, Ont., Canada;Univ. of Waterloo, Waterloo, Ont., Canada
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
1997

Citing 22
Cited 12

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
The text editor sam

Software—Practice & Experience
Generalized string matching

SIAM Journal on Computing
From regular expressions to deterministic automata

Theoretical Computer Science
Efficient text searching

Efficient text searching
Programming perl

Programming perl
An algorithm for string matching with a sequence of don't cares

Information Processing Letters
Algorithms for finding patterns in strings

Handbook of theoretical computer science (vol. A)
A new approach to text searching

Communications of the ACM
Fast text searching: allowing errors

Communications of the ACM
Overview of the first TREC conference

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Tcl and the Tk toolkit

Tcl and the Tk toolkit
Lightweight source model extraction

SIGSOFT '95 Proceedings of the 3rd ACM SIGSOFT symposium on Foundations of software engineering
Derivatives of Regular Expressions

Journal of the ACM (JACM)
A fast string searching algorithm

Communications of the ACM
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Programming Techniques: Regular expression search algorithm

Communications of the ACM
Automatic generation of efficient lexical processors using finite state techniques

Communications of the ACM
Author's Guide to the Standard Generalized Markup Language

Author's Guide to the Standard Generalized Markup Language
Introduction To Automata Theory, Languages, And Computation

Introduction To Automata Theory, Languages, And Computation
The Design and Analysis of Computer Algorithms

The Design and Analysis of Computer Algorithms
From Regular Expressions to DFA's Using Compressed NFA's

CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching

A model independent source code repository

CASCON '99 Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative research
A simple way to construct NFA with fewer states and transitions

ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
Prefix-free regular languages and pattern matching

Theoretical Computer Science
Outfix-Free Regular Languages and Prime Outfix-Free Decomposition

Fundamenta Informaticae
A text pattern-matching tool based on Parsing Expression Grammars

Software—Practice & Experience
Automated time study of skidders using global positioning system data

Computers and Electronics in Agriculture
Overlap-Free regular languages

COCOON'06 Proceedings of the 12th annual international conference on Computing and Combinatorics
Regular languages with variables on graphs

Information and Computation
Prefix-Free regular-expression matching

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
A new linearizing restriction in the pattern matching problem

FCT'05 Proceedings of the 15th international conference on Fundamentals of Computation Theory
Outfix-free regular languages and prime outfix-free decomposition

ICTAC'05 Proceedings of the Second international conference on Theoretical Aspects of Computing
Outfix-Free Regular Languages and Prime Outfix-Free Decomposition

Fundamenta Informaticae

Quantified Score

Hi-index	0.00

Visualization

Abstract

The use of regular expressions for text search is widely known and well understood. It is then surprising that the standard techniques and tools prove to be of limited use for searching structured text formatted with SGML or similar markup languages. Our experience with structured text search has caused us to reexamine the current practice. The generally accepted rule of “leftmost longest match” is an unfortunate choice and is at the root of the difficulties. We instead propose a rule which is semantically cleaner. This rule is generally applicable to a variety of text search applications, including source code analysis, and has interesting properties in its own right. We have written a publicly available search tool implementing the theory in the article, which has proved valuable in a variety of circumstances.