Implementation of a High Throughput 3GPP Turbo Decoder on GPU

Authors:
Michael Wu;Yang Sun;Guohui Wang;Joseph R. Cavallaro
Affiliations:
Rice University, Houston, USA;Rice University, Houston, USA;Rice University, Houston, USA;Rice University, Houston, USA
Venue:
Journal of Signal Processing Systems
Year:
2011

Citing 11
Cited 1

Energy efficient turbo decoding for 3G mobile

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Hardware/Software Trade-Offs for Advanced 3G Channel Coding

Proceedings of the conference on Design, automation and test in Europe
WARP, a Unified Wireless Network Testbed for Education and Research

MSE '07 Proceedings of the 2007 IEEE International Conference on Microelectronic Systems Education
SIMD processor-based turbo decoder supporting multiple third-generation wireless standards

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
How GPUs can outperform ASICs for fast LDPC decoding

Proceedings of the 23rd international conference on Supercomputing
Configurable and scalable high throughput turbo decoder architecture for multiple 4G wireless standards

ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
Reconfigurable real-time MIMO detector on GPU

Asilomar'09 Proceedings of the 43rd Asilomar conference on Signals, systems and computers
Design space exploration of the turbo decoding algorithm on GPUs

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Iterative decoding of binary block and convolutional codes

IEEE Transactions on Information Theory
Interleavers for turbo codes using permutation polynomials over integer rings

IEEE Transactions on Information Theory
Implementation of an SDR system using graphics processing unit

IEEE Communications Magazine

Implementation of LTE system on an SDR platform using CUDA and UHD

Analog Integrated Circuits and Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Turbo code is a computationally intensive channel code that is widely used in current and upcoming wireless standards. General-purpose graphics processor unit聽(GPGPU) is a programmable commodity processor that achieves high performance computation power by using many simple cores. In this paper, we present a 3GPP LTE compliant Turbo decoder accelerator that takes advantage of the processing power of GPU to offer fast Turbo decoding throughput. Several techniques are used to improve the performance of the decoder. To fully utilize the computational resources on GPU, our decoder can decode multiple codewords simultaneously, divide the workload for a single codeword across multiple cores, and pack multiple codewords to fit the single instruction multiple data聽(SIMD) instruction width. In addition, we use shared memory judiciously to enable hundreds of concurrent multiple threads while keeping frequently used data local to keep memory access fast. To improve efficiency of the decoder in the high SNR regime, we also present a low complexity early termination scheme based on average extrinsic LLR statistics. Finally, we examine how different workload partitioning choices affect the error correction performance and the decoder throughput.