A Scalable LDPC Decoder on GPU

Authors:
Kiran Kumar Abburi
Affiliations:
-
Venue:
VLSID '11 Proceedings of the 2011 24th International Conference on VLSI Design
Year:
2011

Citing 0
Cited 1

Complexity analysis of software defined DVB-T2 physical layer

Analog Integrated Circuits and Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A flexible and scalable approach for LDPC decodingon CUDA based Graphics Processing Unit (GPU) is presented in this paper. Layered decoding is a popular method for LDPC decoding and is known for its fast convergence. However, efficient implementation of the layered decoding algorithm on GPU is challenging due to the limited amount of data-parallelism available in this algorithm. To overcome this problem, a kernel execution configuration that can decode multiple codewords simultaneously on GPU is developed. This paper proposes a compact data packing scheme to reduce the number of global memory accesses and parity-check matrix representation to reduce constant memory latency. Global memory bandwidth efficiency is improved by coalescing simultaneous memory accesses of threads in a half-warp into a single memory transaction. Asynchronous data transfers are used to hide host memory latency by overlapping kernel execution with data transfers between CPU and GPU. The proposed implementation of LDPC decoder on GPU performs two orders of magnitude faster than the LDPC decoder on a CPU and four times faster than the previously reported LDPC decoder on GPU. This implementation achieves a throughput of 160Mbps, which is comparable to dedicated hardware solutions.