Computation of cyclic redundancy checks via table look-up
Communications of the ACM
Fast software implementation of error detection codes
IEEE/ACM Transactions on Networking (TON)
A Systematic Approach to Building High Performance Software-Based CRC Generators
ISCC '05 Proceedings of the 10th IEEE Symposium on Computers and Communications
Implementation of fast CRC calculation
ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Design and implementation of a field programmable CRC circuit architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IEEE Transactions on Computers
Hi-index | 0.00 |
Deployment of jumbo frame sizes beyond 9000 bytes for storage systems is limited by 32-bit Cyclic Redundancy Checks used by a network protocol. In order to overcome this limitation we study possibility of using 64-bit polynomials in software and hardware, by using fastest multiple lookup tables algorithms for generating CRCs. CRC is a sequential process, thus the software based solutions are limited in throughput by speed and architectural improvements of a single CPU. We study tradeoff between using distributed LUTs and embedded BRAM in hardware implementations. Our results show that BRAM-based approach is the fastest hardware implementation, reaching maximum of 347.37 Gbps while processing 1024 bits at a time, which is 606x faster than the software implementation of the same algorithm running on Xeon 3.2 GHz with 2 MB of L2 cache. The proposed architectures have been implemented on Xilinx Virtex 6 LX550T prototyping device, requiring less than 1% of the device's resources. Our research show that throughput will continue to increase when we increase the number of processed bits at a time.