Language classification using n-grams accelerated by FPGA-based Bloom filters

  • Authors:
  • Arpith Jacob;Maya Gokhale

  • Affiliations:
  • Washington University in St. Louis, St. Louis, Missouri;Lawrence Livermore National Laboratory, Livermore, California

  • Venue:
  • HPRCTA '07 Proceedings of the 1st international workshop on High-performance reconfigurable computing technology and applications: held in conjunction with SC07
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

N-Gram (n-character sequences in text documents) counting is a well-established technique used in classifying the language of text in a document. In this paper, n-gram processing is accelerated through the use of reconfigurable hardware on the XtremeData XD1000 system. Our design employs parallelism at multiple levels, with parallel Bloom Filters accessing on-chip RAM, parallel language classifiers, and parallel document processing. In contrast to another hardware implementation (HAIL algorithm) that uses off-chip SRAM for lookup, our highly scalable implementation uses only on-chip memory blocks. Our implementation of end-to-end language classification runs at 85x comparable software and 1.45x the competing hardware design.