Pipeline FFT architectures optimized for FPGAs

Authors:
Bin Zhou;Yingning Peng;David Hwang
Affiliations:
Department of Electronic Engineering, Tsinghua University, Beijing, China and Department of Electrical and Computer Engineering, George Mason University, Fairfax, VA;Department of Electronic Engineering, Tsinghua University, Beijing, China;Department of Electrical and Computer Engineering, George Mason University, Fairfax, VA
Venue:
International Journal of Reconfigurable Computing - Special issue on selected papers from ReConFig 2008
Year:
2009

Citing 3
Cited 3

A New Approach to Pipeline FFT Processor

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Pipeline and Parallel-Pipeline FFT Processors for VLSI Implementations

IEEE Transactions on Computers
Implementations and Optimizations of Pipeline FFTs on Xilinx FPGAs

RECONFIG '08 Proceedings of the 2008 International Conference on Reconfigurable Computing and FPGAs

Efficient resource sharing architecture for multistandard communication system

VLSI Design - Special issue on CAD for Gigascale SoC Design and Verification Solutions
On a wideband fast fourier transform using piecewise linear approximations: application to a radio telescope spectrometer

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
On a wideband fast fourier transform for a radio telescope

ACM SIGARCH Computer Architecture News - ACM SIGARCH Computer Architecture News/HEART '12

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents optimized implementations of two different pipeline FFT processors on Xilinx Spartan-3 and Virtex-4 FPGAs. Different optimization techniques and rounding schemes were explored. The implementation results achieved better performance with lower resource usage than prior art. The 16-bit 1024-point FFT with the R22SDF architecture had a maximum clock frequency of 95.2 MHz and used 2802 slices on the Spartan-3, a throughput per area ratio of 0.034 Msamples/s/slice. The R4SDC architecture ran at 123.8 MHz and used 4409 slices on the Spartan-3, a throughput per area ratio of 0.028 Msamples/s/slice. On Virtex-4, the 16-bit 1024-point R22SDF architecture ran at 235.6 MHz and used 2256 slice, giving a 0.104 Msamples/s/slice ratio; the 16-bit 1024-point R4SDC architecture ran at 219.2 MHz and used 3064 slices, giving a 0.072 Msamples/s/slice ratio. The R22SDF was more efficient than the R4SDC in terms of throughput per area due to a simpler controller and an easier balanced rounding scheme. This paper also shows that balanced stage rounding is an appropriate rounding scheme for pipeline FFT processors.