Design and evaluation of random linear network coding Accelerators on FPGAs

  • Authors:
  • Sunwoo Kim;Won Seob Jeong;Won W. Ro;Jean-Luc Gaudiot

  • Affiliations:
  • Hyundai Motor Company, Republic of Korea;Yonsei University, Seoul, Republic of Korea;Yonsei University, Seoul, Republic of Korea;University of California, Irvine

  • Venue:
  • ACM Transactions on Embedded Computing Systems (TECS)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Network coding is a well-known technique used to enhance network throughput and reliability by applying special coding to data packets. One critical problem in practice, when using the random linear network coding technique, is the high computational overhead. More specifically, using this technique in embedded systems with low computational power might cause serious delays due to the complex Galois field operations and matrix handling. To this end, this article proposes a high-performance decoding logic for random linear network coding using field-programmable gate-array (FPGA) technology. We expect that the inherent reconfigurability of FPGAs will provide sufficient performance as well as programmability to cope with changes in the specification of the coding. The main design motivation was to improve the decoding delay by dividing and parallelizing the entire decoding process. Fast arithmetic operations are achieved by the proposed parallelized GF ALUs, which allow calculations with all the elements of a single row of a matrix to be performed concurrently. To improve the flexibility in the utilization of the FPGA components, two different decoding methods have been designed and compared. The performance of the proposed idea is evaluated by comparing with the performance of the decoding process executed by general-purpose processors through an equivalent software algorithm. Overall, a maximum throughput of 65.98 Mbps is achieved with the proposed FPGA design on an XC5VLX110T Virtex 5 device. In addition, the proposed design provides speedups of up to 13.84 compared to an aggressively parallelized software decoding algorithm run on a quad-core AMD processor. Moreover, the design affords 12 times higher power efficiency in terms of throughput per watt than an ARM Coretex-A9 processor.