Fast Parallel FFT on a Reconfigurable Computation Platform

Authors:
Amir H. Kamalizad;Chengzhi Pan;Nader Bagherzadeh
Affiliations:
-;-;-
Venue:
SBAC-PAD '03 Proceedings of the 15th Symposium on Computer Architecture and High Performance Computing
Year:
2003

Citing 0
Cited 8

MaRS: a macro-pipelined reconfigurable system

Proceedings of the 1st conference on Computing frontiers
Accelerating Scientific Applications with the SRC-6 Reconfigurable Computer: Methodologies and Analysis

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
An Integrated Memory Array Processor Architecture for Embedded Image Recognition Systems

Proceedings of the 32nd annual international symposium on Computer Architecture
An Integrated Memory Array Processor for Embedded Image Recognition Systems

IEEE Transactions on Computers
Mapping of the FFT on a reconfigurable architecture targeted to SDR applications

SOC'09 Proceedings of the 11th international conference on System-on-chip
A route system based on ant colony for coarse-grain reconfigurable architecture

ICNC'06 Proceedings of the Second international conference on Advances in Natural Computation - Volume Part II
Fast parallel FFT on CTaiJi: a coarse-grained reconfigurable computation platform

ISPA'05 Proceedings of the Third international conference on Parallel and Distributed Processing and Applications
Implementation of FFT on General-Purpose Architectures for FPGA

International Journal of Embedded and Real-Time Communication Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents implementation of a very fast parallel complex FFT on M2, the second generation of MorphoSys Reconfigurable computation platform, which is targeting on streamed applications such as multimedia and DSP. The proposed mapping comprises fast presorting, cascaded radix-2 stages, and post-reordering. Data and twiddle factors are 16-bit real and 16-bit imaginary in 2's complement format and scaling is performed to avoid overflow. The mapping is tested on our cycle-accurate simulator, "Mulate", and the performance is encouragingly better than other architectures such as Imagine and VIRAM. Moreover, the performance is scalable according to FFT sizes. Since there is no functionality specifically tailored to FFT, the results demonstrate the capability of MorphoSys architecture to extract parallelism from streamed applications. Further rationales are given based on the concepts of scalar operand networks and memory hierarchy.