Optimizing threaded MPI execution on SMP clusters

  • Authors:
  • Hong Tang;Tao Yang

  • Affiliations:
  • Department of Computer Science, University of California, Santa Barbara, CA;Department of Computer Science, University of California, Santa Barbara, CA

  • Venue:
  • ICS '01 Proceedings of the 15th international conference on Supercomputing
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Our previous work has shown that using threads to execute MPI programs can yield great performance gain on multiprogrammed shared-memory machines. This paper investigates the design and implementation of a thread-based MPI system on SMP clusters. Our study indicates that with a proper design for threaded MPI execution, both point-to-point and collective communication performance can be improved substantially, compared to a process-based MPI implementation in a cluster environment. Our contribution includes a hierarchy-aware and adaptive communication scheme for threaded MPI execution and a thread-safe network device abstraction that uses event-driven synchronization and provides separated collective and point-to-point communication channels. This paper describes the implementation of our design and illustrates its performance advantage on a Linux SMP cluster.