Process Mapping for MPI Collective Communications

Authors:
Jin Zhang;Jidong Zhai;Wenguang Chen;Weimin Zheng
Affiliations:
Department of Computer Science and Technology, Tsinghua University, China;Department of Computer Science and Technology, Tsinghua University, China;Department of Computer Science and Technology, Tsinghua University, China;Department of Computer Science and Technology, Tsinghua University, China
Venue:
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Year:
2009

Citing 13
Cited 2

Efficient Algorithms for All-to-All Communications in Multiport Message-Passing Systems

IEEE Transactions on Parallel and Distributed Systems
MagPIe: MPI's collective communication operations for clustered wide area systems

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimization of MPI collectives on clusters of large-scale SMP's

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
The Hierarchical Factor Algorithm for All-to-All Communication (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
From terabytes to insights

Communications of the ACM - A game experience in every application
Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Parallel Algorithm and Implementation for Realtime Dynamic Simulation of Power System

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Performance Evaluation of View-Oriented Parallel Programming

ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Communicating efficiently on cluster based grids with MPICH-VMI

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters

Proceedings of the 20th annual international conference on Supercomputing
Topology mapping for Blue Gene/L supercomputer

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Optimizing task layout on the Blue Gene/L supercomputer

IBM Journal of Research and Development
A heuristic algorithm for mapping parallel applications on computational grids

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing

Automatic mapping of parallel applications on multicore architectures using the Servet benchmark suite

Computers and Electrical Engineering
Dynamic thread mapping based on machine learning for transactional memory applications

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is an important problem to map virtual parallel processes to physical processors (or cores) in an optimized way to get scalable performance due to non-uniform communication cost in modern parallel computers. Existing work uses profile-guided approaches to optimize mapping schemes to minimize the cost of point-to-point communications automatically. However, these approaches cannot deal with collective communications and may get sub-optimal mappings for applications with collective communications. In this paper, we propose an approach called OPP (Optimized Process Placement) to handle collective communications which transforms collective communications into a series of point-to-point communication operations according to the implementation of collective communications in communication libraries. Then we can use existing approaches to find optimized mapping schemes which are optimized for both point-to-point and collective communications. We evaluated the performance of our approach with micro-benchmarks which include all MPI collective communications, NAS Parallel Benchmark suite and three other applications. Experimental results show that the optimized process placement generated by our approach can achieve significant speedup.