Efficient generation of lexical analyzers
Software—Practice & Experience
RE2C: a more versatile scanner generator
ACM Letters on Programming Languages and Systems (LOPLAS)
Reprogrammable network packet processing on the field programmable port extender (FPX)
FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
From Regular Expressions to DFA's Using Compressed NFA's
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Assisting Network Intrusion Detection with Reconfigurable Hardware
FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Implementation of a Content-Scanning Module for an Internet Firewall
FCCM '03 Proceedings of the 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
XML parsing: a threat to database performance
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
A high performance NIDS using FPGA-based regular expression matching
Proceedings of the 2007 ACM symposium on Applied computing
A case study in SIMD text processing with parallel bit streams: UTF-8 to UTF-16 transcoding
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Scalable Parallel Programming with CUDA
Queue - GPU Computing
Input-independent, scalable and fast string matching on the Cray XMT
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Parsing computer languages with an automaton compiled from a single regular expression
CIAA'06 Proceedings of the 11th international conference on Implementation and Application of Automata
Proceedings of the 24th ACM International Conference on Supercomputing
GPP-Grep: high-speed regular expression processing engine on general purpose processors
RAID'12 Proceedings of the 15th international conference on Research in Attacks, Intrusions, and Defenses
Data-parallel finite-state machines
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Matching regular expressions (regexps) is a very common work-load. For example, tokenization, which consists of recognizing words or keywords in a character stream, appears in every search engine indexer. Tokenization also consumes 30% or more of most XML processors' execution time and represents the first stage of any programming language compiler. Despite the multi-core revolution, regexp scanner generators like flex haven't changed much in 20 years, and they do not exploit the power of recent multi-core architectures (e.g., multiple threads and wide SIMD units). This is unfortunate, especially given the pervasive importance of search engines and the fast growth of our digital universe. Indexing such data volumes demands precisely the processing power that multi-cores are designed to offer. We present an algorithm and a set of techniques for using multi-core features such as multiple threads and SIMD instructions to perform parallel regexp-based tokenization. As a proof of concept, we present a family of optimized kernels that implement our algorithm, providing the features of flex on the Cell/B.E. processor at top performance. Our kernels achieve almost-ideal resource utilization (99.2% of the clock cycles are non-NOP issues). They deliver a peak throughput of 14.30 Gbps per Cell chip, and 9.76 Gbps on Wikipedia input: a remarkable performance, comparable to dedicated hardware solutions. Also, our kernels show speedups of 57-81× over flex on the Cell. Our approach is valuable because it is easily portable to other SIMD-enabled processors, and there is a general trend toward more and wider SIMD instructions in architecture design.