Computer generation of fast fourier transforms for the cell broadband engine

Authors:
Srinivas Chellappa;Franz Franchetti;Markus Püeschel
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
Proceedings of the 23rd international conference on Supercomputing
Year:
2009

Citing 7
Cited 6

Computational frameworks for the fast Fourier transform

Computational frameworks for the fast Fourier transform
Formal loop merging for signal transforms

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Cell Multiprocessor Communication Network: Built for Speed

IEEE Micro
FFT program generation for shared memory: SMP and multicore

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Computer Generation of General Size Linear Transform Libraries

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
A rewriting system for the vectorization of signal transforms

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
FFTC: fastest Fourier transform for the IBM cell broadband engine

HiPC'07 Proceedings of the 14th international conference on High performance computing

Multi-FFT Vectorization for the Cell Multicore Processor

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Partitioning streaming parallelism for multi-cores: a machine learning based approach

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Auto-tuning of fast fourier transform on graphics processors

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
An FFT performance model for optimizing general-purpose processor architecture

Journal of Computer Science and Technology - Special issue on Community Analysis and Information Recommendation
Multicore acceleration of Discrete Event System Specification systems

Simulation
A transpose-free in-place SIMD optimized FFT

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Cell BE is a multicore processor with eight vector accelerators (called SPEs) that implement explicit cache management through direct memory access engines. While the Cell has an impressive floating point peak performance, programming and optimizing for it is difficult as it requires explicit memory management, multithreading, streaming, and vectorization. We address this problem for the discrete Fourier transform (DFT) by extending Spiral, a program generation system, to automatically generate highly optimized implementations for the Cell. The extensions include multi-SPE parallelization and explicit memory streaming, both performed at a high abstraction level using rewriting systems operating on Spiral's internal domain-specific language. Further, we support latency and throughput optimizations, single and double precision, and different data formats. The performance of Spiral's computer generated code is comparable with and sometimes better than existing DFT implementations, where available.