Advancing application process affinity experimentation: open MPI's LAMA-based affinity interface

Authors:
Joshua Hursey;Jeffrey M. Squyres
Affiliations:
University of Wisconsin-La Crosse, La Crosse, WI;Cisco Systems, Inc., San Jose, CA
Venue:
Proceedings of the 20th European MPI Users' Group Meeting
Year:
2013

Citing 5
Cited 0

Topology mapping for Blue Gene/L supercomputer

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Large-scale gyrokinetic particle simulation of microturbulence in magnetically confined fusion plasmas

IBM Journal of Research and Development
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications

PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Near-optimal placement of MPI processes on hierarchical NUMA architectures

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Locality-Aware Parallel Process Mapping for Multi-core HPC Systems

CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Application studies have shown that the tuning of Message Passing Interface (MPI) processes placement in a server's non-uniform memory access (NUMA) networking topology can have a dramatic impact on performance. The performance implications are magnified when running a parallel job across multiple server nodes, especially with large scale MPI applications. As processor and NUMA topologies continue to grow more complex to meet the demands of ever-increasing processor core counts, best practices regarding process placement also need to evolve. This paper presents Open MPI's flexible interface for distributing the individual processes of a parallel job across processing resources in a High Performance Computing (HPC) system, paying particular attention to the internal server NUMA topologies. The interface is a realization of the Locality-Aware Mapping Algorithm (LAMA) [8], and provides both simple and complex mechanisms for specifying regular process-to-processor mappings and affinitization. Open MPI's LAMA implementation is intended as a tool for MPI users to experiment with different process placement strategies on both current and emerging HPC platforms.