Avoiding hot-spots on two-level direct networks

Authors:
Abhinav Bhatele;Nikhil Jain;William D. Gropp;Laxmikant V. Kale
Affiliations:
Lawrence Livermore National Laboratory, Livermore, CA;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Year:
2011

Citing 16
Cited 6

A Mapping Strategy for Parallel Processing

IEEE Transactions on Computers
On mapping parallel algorithms into parallel architectures

Journal of Parallel and Distributed Computing
Task allocation onto a hypercube by recursive mincut bipartitioning

C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
CHARM++: a portable concurrent object oriented system based on C++

OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Performance evaluation of adaptive MPI

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Simulation-based performance prediction for large parallel machines

International Journal of Parallel Programming - Special issue: The next generation software program
Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Blue matter: approaching the limits of concurrency for classical molecular dynamics

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Topology mapping for Blue Gene/L supercomputer

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Technology-Driven, Highly-Scalable Dragonfly Topology

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Optimizing task layout on the Blue Gene/L supercomputer

IBM Journal of Research and Development
Scaling applications to massively parallel machines using Projections performance analysis tool

Future Generation Computer Systems
Topology-aware task mapping for reducing communication contention on large parallel machines

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
The PERCS High-Performance Interconnect

HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
Optimizing communication for Charm++ applications by reducing network contention

Concurrency and Computation: Practice & Experience - Euro-Par 2009
Generic topology mapping strategies for large-scale parallel architectures

Proceedings of the international conference on Supercomputing

A divide and conquer strategy for scaling weather simulations with multiple regions of interest

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Randomizing task placement does not randomize traffic (enough)

Proceedings of the 2013 Interconnection Network Architecture: On-Chip, Multi-Chip
High and stable performance under adverse traffic patterns of tori-connected torus network

Computers and Electrical Engineering
Validation and uncertainty assessment of extreme-scale HPC simulation through bayesian inference

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Performance implications of remote-only load balancing under adversarial traffic in Dragonflies

Proceedings of the 8th International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip
A divide and conquer strategy for scaling weather simulations with multiple regions of interest

Scientific Programming - Selected Papers from Super Computing 2012

Quantified Score

Hi-index	0.00

Visualization

Abstract

A low-diameter, fast interconnection network is going to be a prerequisite for building exascale machines. A two-level direct network has been proposed by several groups as a scalable design for future machines. IBM's PERCS topology and the dragonfly network discussed in the DARPA exascale hardware study are examples of this design. The presence of multiple levels in this design leads to hot-spots on a few links when processes are grouped together at the lowest level to minimize total communication volume. This is especially true for communication graphs with a small number of neighbors per task. Routing and mapping choices can impact the communication performance of parallel applications running on a machine with a two-level direct topology. This paper explores intelligent topology aware mappings of different communication patterns to the physical topology to identify cases that minimize link utilization. We also analyze the trade-offs between using direct and indirect routing with different mappings. We use simulations to study communication and overall performance of applications since there are no installations of two-level direct networks yet. This study raises interesting issues regarding the choice of job scheduling, routing and mapping for future machines.