Improving MPI applications performance on multicore clusters with rank reordering

Authors:
Guillaume Mercier;Emmanuel Jeannot
Affiliations:
Université de Bordeaux, INRIA, LaBRI, Talence, France;Université de Bordeaux, INRIA, LaBRI, Talence, France
Venue:
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Year:
2011

Citing 11
Cited 2

Rank Reordering Strategy for MPI Topology Creation Functions

Proceedings of the 5th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Implementing the MPI process topology mechanism

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Design of High Performance MVAPICH2: MPI2 over InfiniBand

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters

Proceedings of the 20th annual international conference on Supercomputing
An approach for matching communication patterns in parallel applications

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hierarchical Collectives in MPICH2

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications

PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Factorization of a 768-bit RSA modulus

CRYPTO'10 Proceedings of the 30th annual conference on Advances in cryptology
Near-optimal placement of MPI processes on hierarchical NUMA architectures

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
The scalable process topology interface of MPI 2.2

Concurrency and Computation: Practice & Experience

Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Topology aware process mapping

PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern hardware architectures featuring multicores and a complex memory hierarchy raise challenges that need to be addressed by parallel applications programmers. It is therefore tempting to adapt an application communication pattern to the characteristics of the underlying hardware. The MPI standard features several functions that allow the ranks of MPI processes to be reordered according to a graph attached to a newly created communicator. In this paper, we explain how theMPICH2 implementation of the MPI Dist graph create function was modified to reorder the MPI process ranks to create a match between the application communication pattern and the hardware topology. The experimental results on a multicore cluster show that improvements can be achieved as long as the application communication pattern is expressed by a relevant metric.