Programming Techniques: Regular expression search algorithm
Communications of the ACM
Finite automata and their decision problems
IBM Journal of Research and Development
Hi-index | 0.00 |
This paper proposes a multi-thread based regular expression (regexp) matching algorithm, M-DFA (multithreaded DFA), for parallel computer architectures such as multi-core processors and graphic processing units (GPU). At the thread level, one thread is designated to traverse the DFA of a possible matching path until its termination, and at the task level multiple threads concurrently match each input symbol in parallel. Given a set of regexps, the total number of (DFA) state transitions in M-DFA is significantly smaller than that of its traditional DFA counterpart. The significant saving of state transitions is contributed by elimination of backtracking transitions, which commonly occur to mapping of concurrent active states in NFA to DFA and other situations. Experimental result shows that the proposed algorithm achieves significant reduction on state and state transition. In addition, the proposed algorithm running on Nvidia® GTX 480 is 35 times faster than the popular regexp library, RE2 performed on Intel Core i7 CPU.