Recursive hashing functions for n-grams
ACM Transactions on Information Systems (TOIS)
Summary cache: a scalable wide-area Web cache sharing protocol
Proceedings of the ACM SIGCOMM '98 conference on Applications, technologies, architectures, and protocols for computer communication
A fast string searching algorithm
Communications of the ACM
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
A String Matching Algorithm Fast on the Average
Proceedings of the 6th Colloquium, on Automata, Languages and Programming
Approximate Multiple Strings Search
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
The Bloomier filter: an efficient data structure for static support lookup tables
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
Building high accuracy bloom filters using partitioned hashing
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Less hashing, same performance: Building a better Bloom filter
Random Structures & Algorithms
FAWN: a fast array of wimpy nodes
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
SplitScreen: enabling efficient, distributed malware detection
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Profiling and accelerating string matching algorithms in three network content security applications
IEEE Communications Surveys & Tutorials
Hi-index | 0.00 |
This article presents a new, memory efficient and cache-optimized algorithm for simultaneously searching for a large number of patterns in a very large corpus. This algorithm builds upon the Rabin-Karp string search algorithm and incorporates a new type of Bloom filter that we call a feed-forward Bloom filter. While it retains the asymptotic time complexity of previous multiple pattern matching algorithms, we show that this technique, along with a CPU architecture-aware design of the Bloom filter, can provide speed-ups between 2× and 30×, and memory consumption reductions as large as 50× when compared with grep. Our algorithm is also well suited for implementations on GPUs: A modern GPU can search for 3 million patterns at a rate of 580MB/s, and for 100 million patterns (a prohibitive number for traditional algorithms) at a rate of 170MB/s.