Scalable algorithms for constructing balanced spanning trees on system-ranked process groups

Authors:
Akhil Langer;Ramprasad Venkataraman;Laxmikant Kale
Affiliations:
Department of Computer Science, University of Illinois at Urbana-Champaign;Department of Computer Science, University of Illinois at Urbana-Champaign;Department of Computer Science, University of Illinois at Urbana-Champaign
Venue:
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Year:
2012

Citing 7
Cited 0

Fine-grained parallelization of the Car-Parrinello ab initio molecular dynamics method on the IBM Blue Gene/L supercomputer

IBM Journal of Research and Development
Scalability of communicators and groups in MPI

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
A scalable MPI_Comm_split algorithm for exascale computing

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Compact and efficient implementation of the MPI group operations

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Exascale algorithms for generalized MPI_comm_split

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Mapping Dense LU Factorization on Multicore Supercomputer Nodes

IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium
PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer

IPDPS '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current implementations of process groups (subcommunicators) have non-scalable (O(group size)) memory footprints and even worse time complexities for setting up communication. We propose system-ranked process groups, where member ranks are picked by the runtime system, as a cheaper and faster alternative for a subset of collective operations (barrier, broadcast, reduction, allreduce). This paper presents two distributed algorithms for balanced, k-ary spanning tree construction over system-ranked process groups obtained by splitting a parent group. Our schemes have much smaller memory footprints and also perform better, even at modest process counts. We demonstrate performance results up to 131,072 cores of BlueGene/P.