Topology mapping for Blue Gene/L supercomputer
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
IBM Journal of Research and Development
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications
PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Near-optimal placement of MPI processes on hierarchical NUMA architectures
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Locality-Aware Parallel Process Mapping for Multi-core HPC Systems
CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
Hi-index | 0.00 |
Application studies have shown that the tuning of Message Passing Interface (MPI) processes placement in a server's non-uniform memory access (NUMA) networking topology can have a dramatic impact on performance. The performance implications are magnified when running a parallel job across multiple server nodes, especially with large scale MPI applications. As processor and NUMA topologies continue to grow more complex to meet the demands of ever-increasing processor core counts, best practices regarding process placement also need to evolve. This paper presents Open MPI's flexible interface for distributing the individual processes of a parallel job across processing resources in a High Performance Computing (HPC) system, paying particular attention to the internal server NUMA topologies. The interface is a realization of the Locality-Aware Mapping Algorithm (LAMA) [8], and provides both simple and complex mechanisms for specifying regular process-to-processor mappings and affinitization. Open MPI's LAMA implementation is intended as a tool for MPI users to experiment with different process placement strategies on both current and emerging HPC platforms.