Generic multiphase software pipelined partial FFT on instruction level parallel architectures

Authors:
Min Li;David Novo;Bruno Bougard;Trevor Carlson;Liesbet Van Der Perre;Francky Catthoor
Affiliations:
ESAT, K. U. Leuven, Leuven, Belgium, and Nomadic Embedded System Division, IMEC, Leuven, Belgium;ESAT, K. U. Leuven, Leuven, Belgium, and Nomadic Embedded System Division, IMEC, Leuven, Belgium;Nomadic Embedded System Division, IMEC, Leuven, Belgium;Nomadic Embedded System Division, IMEC, Leuven, Belgium;Nomadic Embedded System Division, IMEC, Leuven, Belgium;ESAT, K. U. Leuven, Leuven, Belgium, and Nomadic Embedded System Division, IMEC, Leuven, Belgium
Venue:
IEEE Transactions on Signal Processing
Year:
2009

Citing 10
Cited 1

Software pipelining

ACM Computing Surveys (CSUR)
Loop fusion for clustered VLIW architectures

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study

Proceedings of the conference on Design, automation and test in Europe - Volume 2
Architecture Exploration for a Reconfigurable Architecture Template

IEEE Design & Test
Loop scheduling with timing and switching-activity minimization for VLIW DSP

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Tensor product algebra as a tool for VLSI implementation of the discrete Fourier transform

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
SODA: A High-Performance DSP Architecture for Software-Defined Radio

IEEE Micro
Cognitive engine implementation for wireless multicarrier transceivers

Wireless Communications & Mobile Computing - Cognitive Radio, Software Defined Radio And Adaptive Wireless Systems
A novel generic fast Fourier transform pruning technique and complexity analysis

IEEE Transactions on Signal Processing
An effective memory addressing scheme for FFT processors

IEEE Transactions on Signal Processing

A novel split-radix fast algorithm for 2-D discrete Hartley transform

IEEE Transactions on Circuits and Systems Part I: Regular Papers

Quantified Score

Hi-index	35.68

Visualization

Abstract

The partial fast Fourier trausform (PFFT) is an extended fast Fourier transformation (FFT) where only part of the input or output bins are used. By pruning useless data flow it is possible to achieve a significant speedup in many important applications. Although theoretical aspects of the PFFT have been thoroughly studied in the past three decades, efficient aud generic implementations were rarely reported. The most important obstacle for the optimization of the PFFT is the highly irregular data flow and the associated control flow. In addition, a size-N PFFT has 2N possibilities of data flow patterns, so finding a flexible but efficient implementation is very challenging. Our contribution is a generic method to map the highly irregular data flow of an arbitrary PFFT onto instruction level parallel architectures using software pipelining. By leveraging the algorithmic level flexibilities in a FFT, we select an appropriate data flow variant that enables aggressive optimizations in implementation schemes. Then, we apply a divide and conquer strategy, partitioning the PFFT into three phases. For each phase, we introduce specialized control structures, loop structures, address generation schemes and memory operations. This reduces cycle count, number of executed instructions and memory accesses. By studying ten representative benchmarks from wireless baseband applications, we are able to produce repeatable and successful results on the TMS320C6000. When comparing to two optimized FFT implementations, our work reduces the cycle count by 20.5% to 87.5%, executed instructions by 11.2% to 86.5% and LID and LIP cache accesses by 16.1% to 79.4% and 19.5% to 87.1% respectively. To the best of our knowledge, this is the first reported work about a generic software pipelined PFFT for instruction level parallel architectures.