Enabling concurrent multithreaded MPI communication on multicore petascale systems

  • Authors:
  • Gábor Dózsa;Sameer Kumar;Pavan Balaji;Darius Buntinas;David Goodell;William Gropp;Joe Ratterman;Rajeev Thakur

  • Affiliations:
  • IBM T. J. Watson Research Center, Yorktown Heights, NY;IBM T. J. Watson Research Center, Yorktown Heights, NY;Argonne National Laboratory, Argonne, IL;Argonne National Laboratory, Argonne, IL;Argonne National Laboratory, Argonne, IL;University of Illinois, Urbana, IL;IBM Systems and Technology Group, Rochester, MN;Argonne National Laboratory, Argonne, IL

  • Venue:
  • EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

With the ever-increasing numbers of cores per node on HPC systems, applications are increasingly using threads to exploit the shared memory within a node, combined with MPI across nodes. Achieving high performance when a large number of concurrent threads make MPI calls is a challenging task for an MPI implementation. We describe the design and implementation of our solution in MPICH2 to achieve high-performance multithreaded communication on the IBM Blue Gene/P. We use a combination of a multichannel-enabled network interface, fine-grained locks, lock-free atomic operations, and specially designed queues to provide a high degree of concurrent access while still maintaining MPI's message-ordering semantics. We present performance results that demonstrate that our new design improves the multithreaded message rate by a factor of 3.6 compared with the existing implementation on the BG/P. Our solutions are also applicable to other high-end systems that have parallel network access capabilities.