Optimization of All-to-All Communication on the Blue Gene/L Supercomputer

Authors:
Sameer Kumar;Yogish Sabharwal;Rahul Garg;Philip Heidelberger
Affiliations:
-;-;-;-
Venue:
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Year:
2008

Citing 0
Cited 10

Architecture of the Component Collective Messaging Interface

International Journal of High Performance Computing Applications
Optimization of fast Fourier transforms on the Blue Gene/L supercomputer

HiPC'08 Proceedings of the 15th international conference on High performance computing
Overlapping Methods of All-to-All Communication and FFT Algorithms for Torus-Connected Massively Parallel Supercomputers

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
The IBM Blue Gene/Q interconnection network and message unit

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
On the communication complexity of 3D FFTs and its implications for Exascale

Proceedings of the 26th ACM international conference on Supercomputing
A framework for low-communication 1-D FFT

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Looking under the hood of the IBM blue gene/Q network

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Pattern-independent detection of manual collectives in MPI programs

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
LEF: long edge first routing for two-dimensional mesh network on chip

Proceedings of the Sixth International Workshop on Network on Chip Architectures
A framework for low-communication 1-D FFT

Scientific Programming - Selected Papers from Super Computing 2012

Quantified Score

Hi-index	0.00

Visualization

Abstract

All-to-all communication is a well known performance bottleneck for many applications, such as the ones that use the Fast-Fourier-Transform (FFT) algorithm. We analyze the performance of all-to-all communication on the BlueGene/L torus interconnect that has link contention even for all-to-all operations with short messages. We observed that the performance of all-to-all depends on the shape of the processor partition. We present a performance analysis of all-to-all on partitions of various shapes. We then present optimization schemes that substantially improve the performance of all-to-all with short and large messages.In particular, throughput improved from 64% to over 99% of peak on the 65,536 (64X32X32) node Blue Gene/L machine at the Lawrence Livermore National Lab. We show the impact of the all-to-all performance optimizations in 1-D and 3-D FFT benchmarks. We achieved a performance of over 2.8 TF for the HPC Challenge 1D FFT benchmark with our optimized all-to-all.