High speed CRC with 64-bit generator polynomial on an FPGA

  • Authors:
  • Amila Akagić;Hideharu Amano

  • Affiliations:
  • KEIO University, Yokohama, Japan;KEIO University, Yokohama, Japan

  • Venue:
  • ACM SIGARCH Computer Architecture News
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Deployment of jumbo frame sizes beyond 9000 bytes for storage systems is limited by 32-bit Cyclic Redundancy Checks used by a network protocol. In order to overcome this limitation we study possibility of using 64-bit polynomials in software and hardware, by using fastest multiple lookup tables algorithms for generating CRCs. CRC is a sequential process, thus the software based solutions are limited in throughput by speed and architectural improvements of a single CPU. We study tradeoff between using distributed LUTs and embedded BRAM in hardware implementations. Our results show that BRAM-based approach is the fastest hardware implementation, reaching maximum of 347.37 Gbps while processing 1024 bits at a time, which is 606x faster than the software implementation of the same algorithm running on Xeon 3.2 GHz with 2 MB of L2 cache. The proposed architectures have been implemented on Xilinx Virtex 6 LX550T prototyping device, requiring less than 1% of the device's resources. Our research show that throughput will continue to increase when we increase the number of processed bits at a time.