Gather/scatter hardware support for accelerating Fast Fourier Transform

Authors:
Anderson Kuei-An Ku;Jingling Xue;Yong Guan
Affiliations:
School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia;School of Computer Science and Engineering, University of New South Wales, Sydney, NSW 2052, Australia;College of Information Engineering, Capital Normal University, Beijing, China
Venue:
Journal of Systems Architecture: the EUROMICRO Journal
Year:
2010

Citing 8
Cited 0

A fast Fourier transform compiler

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Loop tiling for parallelism

Loop tiling for parallelism
The Physical Limits of Computing

Computing in Science and Engineering
High Performance FFT Algorithms for Cache-Coherent Multiprocessors

International Journal of High Performance Computing Applications
FFT program generation for shared memory: SMP and multicore

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Scheduling FFT computation on SMP and multicore systems

Proceedings of the 21st annual international conference on Supercomputing
Hardware Support for Efficient Sparse Matrix Vector Multiplication

EUC '08 Proceedings of the 2008 IEEE/IFIP International Conference on Embedded and Ubiquitous Computing - Volume 01
A Modified Split-Radix FFT With Fewer Arithmetic Operations

IEEE Transactions on Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As we enter the multi-core era, seeking methods to boost the performance of single-threaded applications remains critical. Achieving gains in processor performance by increasing the operating frequency has begun to meet more obstacles. However, significant performance improvements can be achieved by extending the capability of the processor with the addition of hardware support, which makes much more effective use of the available transistors. This paper presents a novel hardware support called, DistTree, to speed up processor performance. The DistTree hardware automates gather and scatter operations for applications with complex but predictable memory access patterns like the Fast Fourier Transform (FFT). With this hardware support integrated with a modern microprocessor (the Alpha architecture in our experiments), the FFT performance can reap a more than twofold increase when compared against the FFTW library, a state-of-the-art implementation. The DistTree hardware support enables the processor to spend the majority of processor cycles on executing the computations of an algorithm by reducing both the arithmetic and address computation overhead. Therefore, the performance of many single-threaded applications can be significantly increased.