Advancing application process affinity experimentation: open MPI's LAMA-based affinity interface

  • Authors:
  • Joshua Hursey;Jeffrey M. Squyres

  • Affiliations:
  • University of Wisconsin-La Crosse, La Crosse, WI;Cisco Systems, Inc., San Jose, CA

  • Venue:
  • Proceedings of the 20th European MPI Users' Group Meeting
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Application studies have shown that the tuning of Message Passing Interface (MPI) processes placement in a server's non-uniform memory access (NUMA) networking topology can have a dramatic impact on performance. The performance implications are magnified when running a parallel job across multiple server nodes, especially with large scale MPI applications. As processor and NUMA topologies continue to grow more complex to meet the demands of ever-increasing processor core counts, best practices regarding process placement also need to evolve. This paper presents Open MPI's flexible interface for distributing the individual processes of a parallel job across processing resources in a High Performance Computing (HPC) system, paying particular attention to the internal server NUMA topologies. The interface is a realization of the Locality-Aware Mapping Algorithm (LAMA) [8], and provides both simple and complex mechanisms for specifying regular process-to-processor mappings and affinitization. Open MPI's LAMA implementation is intended as a tool for MPI users to experiment with different process placement strategies on both current and emerging HPC platforms.