Topology-aware task mapping for reducing communication contention on large parallel machines

Authors:
Tarun Agarwal;Amit Sharma;Laxmikant V. Kalé
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign
Venue:
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Year:
2006

Citing 15
Cited 12

Interprocessor Traffic Scheduling Algorithm for Multiple-Processor Networks

IEEE Transactions on Computers
A Mapping Strategy for Parallel Processing

IEEE Transactions on Computers
On mapping parallel algorithms into parallel architectures

Journal of Parallel and Distributed Computing
Task allocation onto a hypercube by recursive mincut bipartitioning

C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
On the Communication Complexity of Generalized 2-D Convolution on Array Processors

IEEE Transactions on Computers
A network-topology independent task allocation strategy for parallel computers

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Multilevel k-way partitioning scheme for irregular graphs

Journal of Parallel and Distributed Computing
An overview of the BlueGene/L Supercomputer

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A Heuristic Algorithm for Mapping Communicating Tasks on Heterogeneous Resources

HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
A New Task Mapping Technique for Communication-Aware Scheduling Strategies

ICPPW '01 Proceedings of the 2001 International Conference on Parallel Processing Workshops
Simulation-based performance prediction for large parallel machines

International Journal of Parallel Programming - Special issue: The next generation software program
Achieving high performance on extremely large parallel machines: performance prediction and load balancing

Achieving high performance on extremely large parallel machines: performance prediction and load balancing
On the Mapping Problem

IEEE Transactions on Computers
Multiprocessor Scheduling with the Aid of Network Flow Algorithms

IEEE Transactions on Software Engineering
Optimizing task layout on the Blue Gene/L supercomputer

IBM Journal of Research and Development

HPC-Colony: services and interfaces for very large systems

ACM SIGOPS Operating Systems Review
Scalable computing with parallel tasks

Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
On deploying tree structured agent applications in networked embedded systems

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems

Computer Science - Research and Development
Scalable node allocation for improved performance in regular and anisotropic 3D torus supercomputers

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Avoiding hot-spots on two-level direct networks

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Hierarchical task mapping of cell-based AMR cosmology simulations

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Enabling efficient placement of virtual infrastructures in the cloud

Proceedings of the 13th International Middleware Conference
Task mapping in rectangular twisted tori

Proceedings of the High Performance Computing Symposium
Predicting application performance using supervised learning on communication features

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Improving inter-node communications in multi-core clusters using a contention-free process mapping algorithm

The Journal of Supercomputing
Combined scheduling and mapping for scalable computing with parallel tasks

Scientific Programming - Biological Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Communication latencies constitute a significant factor in the performance of parallel applications. With techniques such as wormhole routing, the variation in no-load latencies became insignificant, i.e., the no-load latencies for far-away processors were not significantly higher (and too small to matter) than those for nearby processors. Contention in the network is then left as the major factor affecting latencies. With networks such as Fat-Trees of hypercubes, with number of wires growing as P log P, even this is not a very significant factor. However, for torus and grid networks now being used in large machines such as BlueGene/L and the Cray XT3, such contention becomes an issue. We quantify the effect of this contention with benchmarks that vary the number of hops traveled by each communicated byte. We then demonstrate a process mapping strategy that minimizes the impact of topology by heuristically minimizing the total number of hop-bytes communicated. This strategy, and its variants, are implemented in an adaptive runtime system in Charm++ and AdaptiveMPI, so it is available for a broad class of applications.