Locality and topology aware intra-node communication among multicore CPUs

Authors:
Teng Ma;George Bosilca;Aurelien Bouteiller;Jack J. Dongarra
Affiliations:
Innovative Computing Laboratory, University of Tennessee Computer Science Department, Knoxville, TN;Innovative Computing Laboratory, University of Tennessee Computer Science Department, Knoxville, TN;Innovative Computing Laboratory, University of Tennessee Computer Science Department, Knoxville, TN;Innovative Computing Laboratory, University of Tennessee Computer Science Department, Knoxville, TN
Venue:
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Year:
2010

Citing 12
Cited 1

A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing
MagPIe: MPI's collective communication operations for clustered wide area systems

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploiting Hierarchy in Parallel Computer Networks to Optimize Collective Operation Performance

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
LiMIC: Support for High-Performance MPI Intra-node Communication on Linux Cluster

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
A Tool for Optimizing Runtime Parameters of Open MPI

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Data Locality Aware Strategy for Two-Phase Collective I/O

High Performance Computing for Computational Science - VECPAR 2008
Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes

PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Cache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications

PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
High performance RDMA protocols in HPC

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Open MPI: a flexible high performance MPI

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics

Impact of Kernel-assisted MPI communication over scientific applications: CPMD and FFTW

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

A major trend in HPC is the escalation toward manycore, where systems are composed of shared memory nodes featuring numerous processing units. Unfortunately, with scale comes complexity, here in the form of non-uniform memory accesses and cache hierarchies. For most HPC applications, harnessing the power of multicores is hindered by the topology oblivious tuning of the MPI library. In this paper, we propose a framework to tune every type of shared memory communications according to locality and topology. An implementation inside Open MPI is evaluated experimentally and demonstrates significant speedups compared to vanilla Open MPI and MPICH2.