The fast Fourier transform and its applications
The fast Fourier transform and its applications
FFTs in external or hierarchical memory
The Journal of Supercomputing
Computational frameworks for the fast Fourier transform
Computational frameworks for the fast Fourier transform
Fast Fourier transforms for nonequispaced data
SIAM Journal on Scientific Computing
The Future Fast Fourier Transform?
SIAM Journal on Scientific Computing
Parallel Implementation of Multidimensional Transforms without Interprocessor Communication
IEEE Transactions on Computers
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
A Parallel Algorithm for 2-D DFT Computation with No Interprocessor Communication
IEEE Transactions on Parallel and Distributed Systems
A parallel 1-D FFT algorithm for the Hitachi SR8000
Parallel Computing
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Optimization of All-to-All Communication on the Blue Gene/L Supercomputer
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Fast Fourier Transforms: for fun and profit
AFIPS '66 (Fall) Proceedings of the November 7-10, 1966, fall joint computer conference
Parallel implementations of 1-D fast Fourier transform without interprocessor communication
International Journal of Computers and Applications
Scalable Tile Communication-Avoiding QR Factorization on Multicore Cluster Systems
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A hybrid parallel M-D FFT algorithm without interprocessor communication
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: digital speech processing - Volume III
Graph expansion and communication costs of fast matrix multiplication: regular submission
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Simple and practical algorithm for sparse Fourier transform
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Nonuniform fast Fourier transforms using min-max interpolation
IEEE Transactions on Signal Processing
Nearly optimal sparse fourier transform
STOC '12 Proceedings of the forty-fourth annual ACM symposium on Theory of computing
Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
In high-performance computing on distributed-memory systems, communication often represents a significant part of the overall execution time. The relative cost of communication will certainly continue to rise as compute-density growth follows the current technology and industry trends. Design of lower-communication alternatives to fundamental computational algorithms has become an important field of research. For distributed 1-D FFT, communication cost has hitherto remained high as all industry-standard implementations perform three all-to-all internode data exchanges (also called global transposes). These communication steps indeed dominate execution time. In this paper, we present a mathematical framework from which many single-all-to-all and easy-to-implement 1-D FFT algorithms can be derived. For large-scale problems, our implementation can be twice as fast as leading FFT libraries on state-of-the-art computer clusters. Moreover, our framework allows tradeoff between accuracy and performance, further boosting performance if reduced accuracy is acceptable.