Language classification using n-grams accelerated by FPGA-based Bloom filters

Authors:
Arpith Jacob;Maya Gokhale
Affiliations:
Washington University in St. Louis, St. Louis, Missouri;Lawrence Livermore National Laboratory, Livermore, California
Venue:
HPRCTA '07 Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications: held in conjunction with SC07
Year:
2007

Citing 5
Cited 1

Efficient Hardware Hashing Functions for High Performance Computers

IEEE Transactions on Computers
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Biosequence Similarity Search on the Mercury System

Journal of VLSI Signal Processing Systems
Deep Packet Inspection using Parallel Bloom Filters

IEEE Micro

High throughput filtering using FPGA-acceleration

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

N-Gram (n-character sequences in text documents) counting is a well-established technique used in classifying the language of text in a document. In this paper, n-gram processing is accelerated through the use of reconfigurable hardware on the XtremeData XD1000 system. Our design employs parallelism at multiple levels, with parallel Bloom Filters accessing on-chip RAM, parallel language classifiers, and parallel document processing. In contrast to another hardware implementation (HAIL algorithm) that uses off-chip SRAM for lookup, our highly scalable implementation uses only on-chip memory blocks. Our implementation of end-to-end language classification runs at 85x comparable software and 1.45x the competing hardware design.