High-performance regular expression scanning on the Cell/B.E. processor

Authors:
Daniele Paolo Scarpazza;Gregory F. Russell
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY, USA;IBM T.J. Watson Research Center, Yorktown Heights, NY, USA
Venue:
Proceedings of the 23rd international conference on Supercomputing
Year:
2009

Citing 20
Cited 4

Efficient generation of lexical analyzers

Software—Practice & Experience
RE2C: a more versatile scanner generator

ACM Letters on Programming Languages and Systems (LOPLAS)
Reprogrammable network packet processing on the field programmable port extender (FPX)

FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
From Regular Expressions to DFA's Using Compressed NFA's

CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Assisting Network Intrusion Detection with Reconfigurable Hardware

FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Implementation of a Content-Scanning Module for an Internet Firewall

FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
XML parsing: a threat to database performance

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Power Efficient Processor Architecture and The Cell Processor

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Synergistic Processing in Cell's Multicore Architecture

IEEE Micro
Introduction to the cell multiprocessor

IBM Journal of Research and Development - POWER5 and packaging
Cell Multiprocessor Communication Network: Built for Speed

IEEE Micro
A high performance NIDS using FPGA-based regular expression matching

Proceedings of the 2007 ACM symposium on Applied computing
A case study in SIMD text processing with parallel bit streams: UTF-8 to UTF-16 transcoding

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
Scalable Parallel Programming with CUDA

Queue - GPU Computing
Deep Packet Inspection using Parallel Bloom Filters

IEEE Micro
Input-independent, scalable and fast string matching on the Cray XMT

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Parsing computer languages with an automaton compiled from a single regular expression

CIAA'06 Proceedings of the 11th international conference on Implementation and Application of Automata

Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization

Proceedings of the 24th ACM International Conference on Supercomputing
Multicore acceleration of Discrete Event System Specification systems

Simulation
GPP-Grep: high-speed regular expression processing engine on general purpose processors

RAID'12 Proceedings of the 15th international conference on Research in Attacks, Intrusions, and Defenses
Data-parallel finite-state machines

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Matching regular expressions (regexps) is a very common work-load. For example, tokenization, which consists of recognizing words or keywords in a character stream, appears in every search engine indexer. Tokenization also consumes 30% or more of most XML processors' execution time and represents the first stage of any programming language compiler. Despite the multi-core revolution, regexp scanner generators like flex haven't changed much in 20 years, and they do not exploit the power of recent multi-core architectures (e.g., multiple threads and wide SIMD units). This is unfortunate, especially given the pervasive importance of search engines and the fast growth of our digital universe. Indexing such data volumes demands precisely the processing power that multi-cores are designed to offer. We present an algorithm and a set of techniques for using multi-core features such as multiple threads and SIMD instructions to perform parallel regexp-based tokenization. As a proof of concept, we present a family of optimized kernels that implement our algorithm, providing the features of flex on the Cell/B.E. processor at top performance. Our kernels achieve almost-ideal resource utilization (99.2% of the clock cycles are non-NOP issues). They deliver a peak throughput of 14.30 Gbps per Cell chip, and 9.76 Gbps on Wikipedia input: a remarkable performance, comparable to dedicated hardware solutions. Also, our kernels show speedups of 57-81× over flex on the Cell. Our approach is valuable because it is easily portable to other SIMD-enabled processors, and there is a general trend toward more and wider SIMD instructions in architecture design.