Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
The C programming language
A Four Russians algorithm for regular expression pattern matching
Journal of the ACM (JACM)
A new approach to text searching
Communications of the ACM
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Programming Techniques: Regular expression search algorithm
Communications of the ACM
Extended path expressions of XML
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Journal of Algorithms
The C++ Programming Language
Indexing and Querying XML Data for Regular Path Expressions
Proceedings of the 27th International Conference on Very Large Data Bases
Accurate, scalable in-network identification of p2p traffic using application signatures
Proceedings of the 13th international conference on World Wide Web
Fast and memory-efficient regular expression matching for deep packet inspection
Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems
Fast and compact regular expression matching
Theoretical Computer Science
Nested Counters in Bit-Parallel String Matching
LATA '09 Proceedings of the 3rd International Conference on Language and Automata Theory and Applications
Faster Regular Expression Matching
ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
New algorithms for regular expression matching
ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part I
The complexity of regular(-like) expressions
DLT'10 Proceedings of the 14th international conference on Developments in language theory
Fast bit-parallel matching for network and regular expressions
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
String matching with variable length gaps
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Online dictionary matching with variable-length gaps
SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Approximate regular expression matching with multi-strings
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
String matching with variable length gaps
Theoretical Computer Science
Approximate regular expression matching with multi-strings
Journal of Discrete Algorithms
Hi-index | 0.00 |
Regular expression matching is a key task (and often computational bottleneck) in a variety of software tools and applications. For instance, the standard grep and sed utilities, scripting languages such as perl, internet traffic analysis, XML querying, and protein searching. The basic definition of a regular expression is that we combine characters with union, concatenation, and kleene star operators. The length m is proportional to the number of characters. However, often the initial operation is to concatenate characters in fairly long strings, e.g., if we search for certain combinations of words in a firewall. As a result, the number k of strings in the regular expression is significantly smaller than m. Our main result is a new algorithm that essentially replaces m with k in the complexity bounds for regular expression matching. More precisely, after an O(m log k) time and O(m) space preprocessing of the expression, we can match it in a string presented as a stream of characters in O(k log w/w + log k) time per character, where w is the number w of bits in a memory word. For large w, this corresponds to the previous best bound of O(m log w/w + logm). Prior to this work no O(k) bound per character was known. We further extend our solution to efficiently handle character class interval operators C{x, y}. Here, C is a set of characters and C{x, y}, where x and y are integers such that 0 ≤ x ≤ y, represents a string of length between x and y from C. These character class intervals generalize variable length gaps which are frequently used for pattern matching in computational biology applications.