Data-parallel finite-state machines

Authors:
Todd Mytkowicz;Madanlal Musuvathi;Wolfram Schulte
Affiliations:
Microsoft Research, Redmond, USA;Microsoft Research, Redmond, USA;Microsoft, Redmond, USA
Venue:
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Year:
2014

Citing 22
Cited 0

Data parallel algorithms

Communications of the ACM - Special issue on parallelism
Parallel Prefix Computation

Journal of the ACM (JACM)
NR-grep: a fast and flexible pattern-matching tool

Software—Practice & Experience
Snort - Lightweight Intrusion Detection for Networks

LISA '99 Proceedings of the 13th USENIX conference on System administration
Formal languages and their relation to automata

Formal languages and their relation to automata
Optimizing data permutations for SIMD devices

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Fast and memory-efficient regular expression matching for deep packet inspection

Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems
Scan primitives for GPU computing

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Mars: a MapReduce framework on graphics processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Compact architecture for high-throughput regular expression matching on FPGA

Proceedings of the 4th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
High-performance regular expression scanning on the Cell/B.E. processor

Proceedings of the 23rd international conference on Supercomputing
On Parallel Implementations of Deterministic Finite Automata

CIAA '09 Proceedings of the 14th International Conference on Implementation and Application of Automata
Regular Expression Matching on Graphics Hardware for Intrusion Detection

RAID '09 Proceedings of the 12th International Symposium on Recent Advances in Intrusion Detection
Safe programmable speculative parallelism

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Parallelizing the web browser

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Phoenix++: modular MapReduce for shared-memory systems

Proceedings of the second international workshop on MapReduce and its applications
Parallel scanning with bitstream addition: an XML case study

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Parabix: Boosting the efficiency of text processing on commodity processors

HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
A parallel decoder of programmable Huffman codes

IEEE Transactions on Circuits and Systems for Video Technology
Speculative Parallel Pattern Matching

IEEE Transactions on Information Forensics and Security

Quantified Score

Hi-index	0.00

Visualization

Abstract

A finite-state machine (FSM) is an important abstraction for solving several problems, including regular-expression matching, tokenizing text, and Huffman decoding. FSM computations typically involve data-dependent iterations with unpredictable memory-access patterns making them difficult to parallelize. This paper describes a parallel algorithm for FSMs that breaks dependences across iterations by efficiently enumerating transitions from all possible states on each input symbol. This allows the algorithm to utilize various sources of data parallelism available on modern hardware, including vector instructions and multiple processors/cores. For instance, on benchmarks from three FSM applications: regular expressions, Huffman decoding, and HTML tokenization, the parallel algorithm achieves up to a 3x speedup over optimized sequential baselines on a single core, and linear speedups up to 21x on 8 cores.