Near-optimal placement of MPI processes on hierarchical NUMA architectures

Authors:
Emmanuel Jeannot;Guillaume Mercier
Affiliations:
LaBRI and INRIA Bordeaux Sud-Ouest;LaBRI and INRIA Bordeaux Sud-Ouest and Institut Polytechnique de Bordeaux
Venue:
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Year:
2010

Citing 7
Cited 8

Cost-performance tradeoffs for interconnection networks

Discrete Applied Mathematics - Special double volume: interconnection networks
Implementing the MPI process topology mechanism

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters

Proceedings of the 20th annual international conference on Supercomputing
Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
FACT: fast communication trace collection for parallel applications through program slicing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications

PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Approximation algorithms for the weighted independent set problem

WG'05 Proceedings of the 31st international conference on Graph-Theoretic Concepts in Computer Science

Improving MPI applications performance on multicore clusters with rank reordering

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Multi-core and network aware MPI topology functions

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
The impact of heterogeneous multi-core clusters on graph partitioning: an empirical study

Cluster Computing
Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Topology aware process mapping

PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Optimized process placement for collective I/O operations

Proceedings of the 20th European MPI Users' Group Meeting
Advancing application process affinity experimentation: open MPI's LAMA-based affinity interface

Proceedings of the 20th European MPI Users' Group Meeting
A topology-aware load balancing algorithm for clustered hierarchical multi-core machines

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

MPI process placement can play a deterministic role concerning the application performance. This is especially true with nowadays architecture (heterogenous, multicore with different level of caches, etc.). In this paper, we will describe a novel algorithm called TreeMatch that maps processes to resources in order to reduce the communication cost of the whole application. We have implemented this algorithm and will discuss its performance using simulation and on the NAS benchmarks.