Cost-performance tradeoffs for interconnection networks
Discrete Applied Mathematics - Special double volume: interconnection networks
Implementing the MPI process topology mechanism
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Proceedings of the 20th annual international conference on Supercomputing
Towards an Efficient Process Placement Policy for MPI Applications in Multicore Environments
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
FACT: fast communication trace collection for parallel applications through program slicing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications
PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Approximation algorithms for the weighted independent set problem
WG'05 Proceedings of the 31st international conference on Graph-Theoretic Concepts in Computer Science
Improving MPI applications performance on multicore clusters with rank reordering
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Multi-core and network aware MPI topology functions
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Topology aware process mapping
PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Optimized process placement for collective I/O operations
Proceedings of the 20th European MPI Users' Group Meeting
Advancing application process affinity experimentation: open MPI's LAMA-based affinity interface
Proceedings of the 20th European MPI Users' Group Meeting
A topology-aware load balancing algorithm for clustered hierarchical multi-core machines
Future Generation Computer Systems
Hi-index | 0.00 |
MPI process placement can play a deterministic role concerning the application performance. This is especially true with nowadays architecture (heterogenous, multicore with different level of caches, etc.). In this paper, we will describe a novel algorithm called TreeMatch that maps processes to resources in order to reduce the communication cost of the whole application. We have implemented this algorithm and will discuss its performance using simulation and on the NAS benchmarks.