Generic multi-phase software-pipelined Partial-FFT on instruction-level-parallel architectures and SDR baseband applications

Authors:
Min Li;David Novo;Bruno Bougard;Liesbet Van Der Perre;Francky Catthoor
Affiliations:
IMEC, Leuven, Belgium;IMEC, Leuven, Belgium;IMEC, Leuven, Belgium;IMEC, Leuven, Belgium;IMEC, Leuven, Belgium
Venue:
Proceedings of the conference on Design, automation and test in Europe
Year:
2008

Citing 7
Cited 1

Software pipelining

ACM Computing Surveys (CSUR)
Loop fusion for clustered VLIW architectures

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Design Methodology for a Tightly Coupled VLIW/Reconfigurable Matrix Architecture: A Case Study

Proceedings of the conference on Design, automation and test in Europe - Volume 2
Loop scheduling with timing and switching-activity minimization for VLIW DSP

ACM Transactions on Design Automation of Electronic Systems (TODAES)
SODA: A High-Performance DSP Architecture for Software-Defined Radio

IEEE Micro
A novel generic fast Fourier transform pruning technique and complexity analysis

IEEE Transactions on Signal Processing
An effective memory addressing scheme for FFT processors

IEEE Transactions on Signal Processing

A robust channel estimator for high-mobility STBC-OFDM systems

IEEE Transactions on Circuits and Systems Part I: Regular Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

The PFFT (Partial FFT) is an extended FFT where only part of input or output bins are used. By pruning the useless dataflow, the PFFT can potentially achieve a significant speedup in many important applications. Although theoretical aspects of the PFFT have been thoroughly studied in past three decades, efficient implementations were rarely reported. The most important obstacle is the highly irregular dataflow and the associated control flow. In addition, a size-N PFFT has 2N dataflow possibilities, so that delivering both flexibility and efficiency in the same implementation is very challenging. This paper presents a generic scheme to map the highly irregular dataflow of arbitrary PFFT onto ILP architectures with highly efficient SWP (SoftWare-Pipelining). Constraints and opportunities of algorithms and architecture are carefully analyzed and exploited. We introduce a multi-phase partitioning, bringing heterogeneous control structures and heterogeneous software pipelining schemes to minimize control overheads and to maximize the efficiency of SWP. The proposal has been tested with 10 representative benchmarks extracted from baseband applications. In experiments cycle-counts, instructions, NOPs, L1D/L1P access/miss/hit are thoroughly analyzed. Comparing to full FFTs with efficient SWP, our work reduces 20.5% - 87.5% cycle-counts, 11.2% - 86.5% instructions, 16.1% - 79.4% L1D cache accesses and 19.5% - 87.1% L1P cache accesses. To the best of our knowledge, this is the first reported work about the generic software-pipelined PFFT on ILP architectures.