MPI on a Million Processors

Authors:
Pavan Balaji;Darius Buntinas;David Goodell;William Gropp;Sameer Kumar;Ewing Lusk;Rajeev Thakur;Jesper Larsson Träff
Affiliations:
Argonne National Laboratory, Argonne, USA 60439;Argonne National Laboratory, Argonne, USA 60439;Argonne National Laboratory, Argonne, USA 60439;University of Illinois, Urbana, USA 61801;IBM T.J. Watson Research Center, Yorktown Heights, USA 10598;Argonne National Laboratory, Argonne, USA 60439;Argonne National Laboratory, Argonne, USA 60439;NEC Laboratories Europe, Sankt Augustin, Germany
Venue:
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Year:
2009

Citing 8
Cited 12

FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World

Proceedings of the 7th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
MPICH-V: toward a scalable fault tolerant MPI for volatile nodes

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
SMP-Aware Message Passing Programming

HIPS '03 Proceedings of the Eighth International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS'03)
Fault Tolerance in Message Passing Interface Programs

International Journal of High Performance Computing Applications
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)

Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Architecture of the Component Collective Messaging Interface

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes

PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Sparse collective operations for MPI

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing

Scalability of communicators and groups in MPI

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
An in-place algorithm for irregular all-to-all communication with limited memory

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Compact and efficient implementation of the MPI group operations

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Cosmic microwave background map-making at the petascale and beyond

Proceedings of the international conference on Supercomputing
The analysis of cluster interconnect with the network tests2 toolkit

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Network-theoretic classification of parallel computation patterns

International Journal of High Performance Computing Applications
Adjusting process count on demand for petascale global optimization

Parallel Computing
Multi-level parallelism for incompressible flow computations on GPU clusters

Parallel Computing
Privacy-preserving billing for e-ticketing systems in public transportation

Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society
A fast and resource-conscious MPI message queue mechanism for large-scale jobs

Future Generation Computer Systems
Multi-core performance studies of a Monte Carlo neutron transport code

International Journal of High Performance Computing Applications
An integrated fine-grain runtime system for MPI

Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Petascale machines with close to a million processors will soon be available. Although MPI is the dominant programming model today, some researchers and users wonder (and perhaps even doubt) whether MPI will scale to such large processor counts. In this paper, we examine this issue of how scalable is MPI. We first examine the MPI specification itself and discuss areas with scalability concerns and how they can be overcome. We then investigate issues that an MPI implementation must address to be scalable. We ran some experiments to measure MPI memory consumption at scale on up to 131,072 processes or 80% of the IBM Blue Gene/P system at Argonne National Laboratory. Based on the results, we tuned the MPI implementation to reduce its memory footprint. We also discuss issues in application algorithmic scalability to large process counts and features of MPI that enable the use of other techniques to overcome scalability limitations in applications.