Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments

Authors:
Guillaume Mercier;Jérôme Clet-Ortega
Affiliations:
Université de Bordeaux - INRIA - LaBRI, Talence cedex F-33405;Université de Bordeaux - INRIA - LaBRI, Talence cedex F-33405
Venue:
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Year:
2009

Citing 5
Cited 11

Implementing the MPI process topology mechanism

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed

International Journal of High Performance Computing Applications
Implementation and evaluation of shared-memory communication and synchronization operations in MPICH2 using the Nemesis communication subsystem

Parallel Computing
Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes

PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Building portable thread schedulers for hierarchical multiprocessors: the bubblesched framework

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Near-optimal placement of MPI processes on hierarchical NUMA architectures

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Adaptive MPI multirail tuning for non-uniform input/output access

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Design of efficient Java message-passing collectives on multi-core clusters

The Journal of Supercomputing
Improving MPI applications performance on multicore clusters with rank reordering

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Multi-core and network aware MPI topology functions

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Automatic mapping of parallel applications on multicore architectures using the Servet benchmark suite

Computers and Electrical Engineering
Parallel FEM adaptation on hierarchical architectures

PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
A topology-aware load balancing algorithm for clustered hierarchical multi-core machines

Future Generation Computer Systems
The Servet 3.0 benchmark suite: Characterization of network performance degradation

Computers and Electrical Engineering
Improving inter-node communications in multi-core clusters using a contention-free process mapping algorithm

The Journal of Supercomputing
Combined scheduling and mapping for scalable computing with parallel tasks

Scientific Programming - Biological Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a method to efficiently place MPI processes on multicore machines. Since MPI implementations often feature efficient supports for both shared-memory and network communication, an adequate placement policy is a crucial step to improve applications performance. As a case study, we show the results obtained for several NAS computing kernels and explain how the policy influences overall performance. In particular, we found out that a policy merely increasing the intranode communication ratio is not enough and that cache utilization is also an influential factor. A more sophisticated policy (eg. one taking into account the architecture's memory structure) is required to observe performance improvements.