A Speculative Parallel DFA Membership Test for Multicore, SIMD and Cloud Computing Environments

Authors:
Yousun Ko;Minyoung Jung;Yo-Sub Han;Bernd Burgstaller
Affiliations:
Department of Computer Science, Yonsei University, Seoul, Korea;Department of Computer Science, Yonsei University, Seoul, Korea;Department of Computer Science, Yonsei University, Seoul, Korea;Department of Computer Science, Yonsei University, Seoul, Korea
Venue:
International Journal of Parallel Programming
Year:
2014

Citing 23
Cited 0

Data parallel algorithms

Communications of the ACM - Special issue on parallelism
Grail: a C++ library for automata and expressions

Journal of Symbolic Computation - Special issue on “algorithms: implementation, libraries and use”
Programming with POSIX threads

Programming with POSIX threads
Parallel Prefix Computation

Journal of the ACM (JACM)
A fast string searching algorithm

Communications of the ACM
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Derivation of a parallel string matching algorithm

Information Processing Letters
A Parallel DFA Minimization Algorithm

HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Exploiting Hierarchy in Parallel Computer Networks to Optimize Collective Operation Performance

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Enhancing byte-level network intrusion detection signatures with context

Proceedings of the 10th ACM conference on Computer and communications security
Efficient randomized pattern-matching algorithms

IBM Journal of Research and Development - Mathematics and computing
Snort - Lightweight Intrusion Detection for Networks

LISA '99 Proceedings of the 13th USENIX conference on System administration
Towards Automatic Generation of Vulnerability-Based Signatures

SP '06 Proceedings of the 2006 IEEE Symposium on Security and Privacy
Synergistic Processing in Cell's Multicore Architecture

IEEE Micro
On Parallel Implementations of Deterministic Finite Automata

CIAA '09 Proceedings of the 14th International Conference on Implementation and Application of Automata
Multi-byte Regular Expression Matching with Speculation

RAID '09 Proceedings of the 12th International Symposium on Recent Advances in Intrusion Detection
Principles of Parallel Programming

Principles of Parallel Programming
The impact of virtualization on network performance of amazon EC2 data center

INFOCOM'10 Proceedings of the 29th conference on Information communications
Parallelizing the web browser

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Runtime measurements in the cloud: observing, analyzing, and reducing variance

Proceedings of the VLDB Endowment
Speculative Parallel Pattern Matching

IEEE Transactions on Information Forensics and Security
Speculative parallelization: eliminating the overhead of failure

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Non-blocking parallel subset construction on shared-memory multicore architectures

AusPDC '13 Proceedings of the Eleventh Australasian Symposium on Parallel and Distributed Computing - Volume 140

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present techniques to parallelize membership tests for Deterministic Finite Automata (DFAs). Our method searches arbitrary regular expressions by matching multiple bytes in parallel using speculation. We partition the input string into chunks, match chunks in parallel, and combine the matching results. Our parallel matching algorithm exploits structural DFA properties to minimize the speculative overhead. Unlike previous approaches, our speculation is failure-free, i.e., (1) sequential semantics are maintained, and (2) speed-downs are avoided altogether. On architectures with a SIMD gather-operation for indexed memory loads, our matching operation is fully vectorized. The proposed load-balancing scheme uses an off-line profiling step to determine the matching capacity of each participating processor. Based on matching capacities, DFA matches are load-balanced on inhomogeneous parallel architectures such as cloud computing environments. We evaluated our speculative DFA membership test for a representative set of benchmarks from the Perl-compatible Regular Expression (PCRE) library and the PROSITE protein database. Evaluation was conducted on a 4 CPU (40 cores) shared-memory node of the Intel Academic Program Manycore Testing Lab (Intel MTL), on the Intel AVX2 SDE simulator for 8-way fully vectorized SIMD execution, and on a 20-node (288 cores) cluster on the Amazon EC2 computing cloud. Obtained speedups are on the order of $$\mathcal O \left( 1+\frac{|P|-1}{|Q|\cdot \gamma }\right) $$ O 1 + | P | - 1 | Q | · 驴 , where $$|P|$$ | P | denotes the number of processors or SIMD units, $$|Q|$$ | Q | denotes the number of DFA states, and $$0 0 驴 ≤ 1 represents a statically computed DFA property. For all observed cases, we found that $$0.02 0.02 驴 0.47 . Actual speedups range from 2.3 $$\times $$ 脳 to 38.8 $$\times $$ 脳 for up to 512 DFA states for PCRE, and between 1.3 $$\times $$ 脳 and 19.9 $$\times $$ 脳 for up to 1,288 DFA states for PROSITE on a 40-core MTL node. Speedups on the EC2 computing cloud range from 5.0 $$\times $$ 脳 to 65.8 $$\times $$ 脳 for PCRE, and from 5.0 $$\times $$ 脳 to 138.5 $$\times $$ 脳 for PROSITE. Speedups of our C-based DFA matcher over the Perl-based ScanProsite scan tool range from 559.3 $$\times $$ 脳 to 15079.7 $$\times $$ 脳 on a 40-core MTL node. We show the scalability of our approach for input-sizes of up to 10 GB.