Rank Reordering Strategy for MPI Topology Creation Functions
Proceedings of the 5th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Implementing the MPI process topology mechanism
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
SMP-Aware Message Passing Programming
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications
PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Near-optimal placement of MPI processes on hierarchical NUMA architectures
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
The scalable process topology interface of MPI 2.2
Concurrency and Computation: Practice & Experience
What MPI could (and cannot) do for mesh-partitioning on non-homogeneous networks
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Actor scheduling for multicore hierarchical memory platforms
Proceedings of the twelfth ACM SIGPLAN workshop on Erlang
Improving the performance of actor model runtime environments on multicore and manycore platforms
Proceedings of the 2013 workshop on Programming based on actors, agents, and decentralized control
Hi-index | 0.00 |
MPI standard offers a set of topology-aware interfaces that can be used to construct graph and Cartesian topologies for MPI applications. These interfaces have been mostly used for topology construction and not for performance improvement. To optimize the performance, in this paper we use graph embedding and node/network architecture discovery modules to match the communication topology of the applications to the physical topology of multi-core clusters with multi-level networks. Micro-benchmark results show considerable improvement in communication performance when using weighted and network-aware mapping. We also show that the implementation can improve communication and execution time of the applications.