Topology aware task mapping techniques: an api and case study

Authors:
Abhinav Bhatelé;Eric Bohm;Laxmikant V. Kalé
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA;University of Illinois at Urbana-Champaign, Urbana, IL, USA
Venue:
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2009

Citing 3
Cited 1

CHARM++: a portable concurrent object oriented system based on C++

OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Universal Wormhole Routing

IEEE Transactions on Parallel and Distributed Systems
A Survey of Wormhole Routing Techniques in Direct Networks

Computer

Toward performance models of MPI implementations for understanding application scaling issues

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

Optimal network performance is critical to efficient parallel scaling for communication-bound applications on large machines. With wormhole routing, no-load latencies do not increase significantly with number of hops traveled. Yet, we, and others have recently shown that in presence of contention, message latencies can grow substantially large. Hence task mapping strategies should take the topology of the machine into account on large machines. This poster presents a uniform API which provides topology information on 3D tori like IBM Blue Gene and Cray XT machines. We present techniques to use this API to improve performance. The API can be used by user-level codes to obtain information about allocated partitions at runtime which is essential for mapping. We motivate why it is important to consider network topology, using a simple 3D Stencil kernel. We then present mapping strategies for a production code, OpenAtom, running on three-dimensional torus and mesh topologies. OpenAtom presents complex communication scenarios of interaction between multiple groups of objects. Results are presented in the context of 3D Stencil and OpenAtom on up to 16,384 processors of Blue Gene/L, 8,192 processors of Blue Gene/P and 2,048 processors of Cray XT3.