Efficient 2D FFT implementation on mediaprocessors

Authors:
Coskun Mermer;Donglok Kim;Yongmin Kim
Affiliations:
Image Computing Systems Laboratory, Departments of Electrical Engineering and Bioengineering, Box 352500, University of Washington, Seattle, WA;Image Computing Systems Laboratory, Departments of Electrical Engineering and Bioengineering, Box 352500, University of Washington, Seattle, WA;Image Computing Systems Laboratory, Departments of Electrical Engineering and Bioengineering, Box 352500, University of Washington, Seattle, WA
Venue:
Parallel Computing
Year:
2003

Citing 14
Cited 1

Two and three dimensional FFTS on highly parallel computers

Parallel Computing
An Architecture for a Video Rate Two-Dimensional Fast Fourier Transform Processor

IEEE Transactions on Computers
Discrete-time signal processing

Discrete-time signal processing
FFTs in external or hierarchical memory

The Journal of Supercomputing
Using local memory to boost the performance of FFT algorithms on the CRAY-2 supercomputer

The Journal of Supercomputing
Computational frameworks for the fast Fourier transform

Computational frameworks for the fast Fourier transform
Software pipelining

ACM Computing Surveys (CSUR)
High-performance FFT algorithms for the Convex C4/XA supercomputer

The Journal of Supercomputing - Special issue: trends in parallel operating systems
Performing out-of-core FFTs on parallel disk systems

Parallel Computing - Special issues on applications: parallel data servers and applications
An efficient FFT algorithm for superscalar and VLIW processor architectures

Real-Time Imaging
Handbook of Real-Time Fast Fourier Transforms: Algorithms to Product Testing

Handbook of Real-Time Fast Fourier Transforms: Algorithms to Product Testing
Multidimensional Digital Signal Processing

Multidimensional Digital Signal Processing
Data Cache and Direct Memory Access in Programming Mediaprocessors

IEEE Micro
Radix-4 FFT implementation using SIMD multimedia instructions

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 04

A new parallel strategy for two-dimensional incompressible flow simulations using pseudo-spectral methods

Journal of Computational Physics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have developed an efficient implementation to compute the 2D fast Fourier transform (FFT) on a new very long instruction word programmable mediaprocessor. Using instruction-level parallelism and a multimedia instruction set, our radix-4 Cooley-Tukey algorithm optimally maps the FFT computation to the processing resources of the Hitachi/Equator's MAP mediaprocessor. We have also achieved more efficient data I/O and lower data transfer time compared to traditional implementations by processing several columns in parallel during the column-wise stage of row-column decomposition. We used a programmable direct memory access engine and a double-buffering scheme in the data cache to perform the computation and the data transfer in parallel. Our implementation resulted in 22.4 ms total execution time for a 512 × 512-point 2D complex FFT, which is faster than previous single-chip programmable or dedicated solutions. The implementations on two other mediaprocessors, the TriMedia TM1100 and the BOPS ManArray, illustrate the importance of the instruction set architecture for achieving high performance and the trend of data I/O becoming the limitation on the 2D FFT performance in newer mediaprocessors.