MPI on a Million Processors

  • Authors:
  • Pavan Balaji;Darius Buntinas;David Goodell;William Gropp;Sameer Kumar;Ewing Lusk;Rajeev Thakur;Jesper Larsson Träff

  • Affiliations:
  • Argonne National Laboratory, Argonne, USA 60439;Argonne National Laboratory, Argonne, USA 60439;Argonne National Laboratory, Argonne, USA 60439;University of Illinois, Urbana, USA 61801;IBM T.J. Watson Research Center, Yorktown Heights, USA 10598;Argonne National Laboratory, Argonne, USA 60439;Argonne National Laboratory, Argonne, USA 60439;NEC Laboratories Europe, Sankt Augustin, Germany

  • Venue:
  • Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Petascale machines with close to a million processors will soon be available. Although MPI is the dominant programming model today, some researchers and users wonder (and perhaps even doubt) whether MPI will scale to such large processor counts. In this paper, we examine this issue of how scalable is MPI. We first examine the MPI specification itself and discuss areas with scalability concerns and how they can be overcome. We then investigate issues that an MPI implementation must address to be scalable. We ran some experiments to measure MPI memory consumption at scale on up to 131,072 processes or 80% of the IBM Blue Gene/P system at Argonne National Laboratory. Based on the results, we tuned the MPI implementation to reduce its memory footprint. We also discuss issues in application algorithmic scalability to large process counts and features of MPI that enable the use of other techniques to overcome scalability limitations in applications.