Two and three dimensional FFTS on highly parallel computers
Parallel Computing
Optimum Broadcasting and Personalized Communication in Hypercubes
IEEE Transactions on Computers
Computational frameworks for the fast Fourier transform
Computational frameworks for the fast Fourier transform
Methods and problems of communication in usual networks
Proceedings of the international workshop on Broadcasting and gossiping 1990
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems
IEEE Transactions on Parallel and Distributed Systems
How helpers hasten h-relations
Journal of Algorithms
The Scalability of FFT on Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Scheduling Parallel Communication: The h-relation Problem
MFCS '95 Proceedings of the 20th International Symposium on Mathematical Foundations of Computer Science
The Hierarchical Factor Algorithm for All-to-All Communication (Research Note)
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Exchanging messages of different sizes
Journal of Parallel and Distributed Computing
Communication analysis of parallel 3D FFT for flat cartesian meshes on large Blue Gene systems
HiPC'08 Proceedings of the 15th international conference on High performance computing
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Hi-index | 0.00 |
We consider FFTs for networks with multiprocessor nodes using 2D data decomposition. In this application, processors perform collective all-to-all communication in different groups independently at the same time. Thus the individual processors of the nodes might be involved in independent collective communication. The underlying communication algorithm should account for that fact. For short messages, we propose a sparse version of Bruck's algorithm which handles such multiple collectives. The distribution of the FFT data to the nodes is discussed for the local and global application of Bruck's original algorithm, as well as the suggested sparse version. The performance of the different approaches is compared.