Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization

Authors:
Jamin Naghmouchi;Daniele Paolo Scarpazza;Mladen Berekovic
Affiliations:
IBM T.J. Watson Research Center, Yorktown Heights, NY and Technische Universität Braunschweig, Braunschweig, Germany;IBM T.J. Watson Research Center, Yorktown Heights, NY;Technische Universität Braunschweig, Braunschweig, Germany
Venue:
Proceedings of the 24th ACM International Conference on Supercomputing
Year:
2010

Citing 11
Cited 2

Efficient string matching: an aid to bibliographic search

Communications of the ACM
XML parsing: a threat to database performance

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
Scalable Parallel Programming with CUDA

Queue - GPU Computing
Benchmarking GPUs to tune dense linear algebra

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
High-performance regular expression scanning on the Cell/B.E. processor

Proceedings of the 23rd international conference on Supercomputing
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Proceedings of the 36th annual international symposium on Computer architecture
Input-independent, scalable and fast string matching on the Cray XMT

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Workload characterization and optimization of high-performance text indexing on the Cell Broadband Engine (Cell/B.E.)

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Tools for Very Fast Regular Expression Matching

Computer

A GPU-based high-throughput image retrieval algorithm

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
GPUs as an opportunity for offloading garbage collection

Proceedings of the 2012 international symposium on Memory Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore the intersection between an emerging class of architectures and a prominent workload: GPGPUs (General-Purpose Graphics Processing Units) and regular expression matching, respectively. It is a challenging task because this workload -- with its irregular, non-coalesceable memory access patterns -- is very different from the regular, numerical workloads that run efficiently on GPGPUs. Small-ruleset expression matching is a fundamental building block for search engines, business analytics, natural language processing, XML processing, compiler front-ends and network security. Despite the abundant power that GPGPUs promise, little work has investigated their potential and limitations with this workload, and how to best utilize the memory classes that GPGPUs offer. We describe an optimization path of the kernel of flex (the popular, open-source regular expression scanner generator) to four nVidia GPGPU models, with decisions based on quantitative micro-benchmarking, performance counters and simulator runs. Our solution achieves a tokenization throughput that exceeds the results obtained by the GPGPU-based string matching solutions presented so far, and compares well with solutions obtained on any architecture.