A Mapping Strategy for Parallel Processing
IEEE Transactions on Computers
On mapping parallel algorithms into parallel architectures
Journal of Parallel and Distributed Computing
Task allocation onto a hypercube by recursive mincut bipartitioning
C3P Proceedings of the third conference on Hypercube concurrent computers and applications: Architecture, software, computer systems, and general issues - Volume 1
CHARM++: a portable concurrent object oriented system based on C++
OOPSLA '93 Proceedings of the eighth annual conference on Object-oriented programming systems, languages, and applications
Performance evaluation of adaptive MPI
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Simulation-based performance prediction for large parallel machines
International Journal of Parallel Programming - Special issue: The next generation software program
Large-scale electronic structure calculations of high-Z metals on the BlueGene/L platform
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Blue matter: approaching the limits of concurrency for classical molecular dynamics
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Topology mapping for Blue Gene/L supercomputer
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Technology-Driven, Highly-Scalable Dragonfly Topology
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Optimizing task layout on the Blue Gene/L supercomputer
IBM Journal of Research and Development
Scaling applications to massively parallel machines using Projections performance analysis tool
Future Generation Computer Systems
Topology-aware task mapping for reducing communication contention on large parallel machines
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
The PERCS High-Performance Interconnect
HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
Optimizing communication for Charm++ applications by reducing network contention
Concurrency and Computation: Practice & Experience - Euro-Par 2009
Generic topology mapping strategies for large-scale parallel architectures
Proceedings of the international conference on Supercomputing
A divide and conquer strategy for scaling weather simulations with multiple regions of interest
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Randomizing task placement does not randomize traffic (enough)
Proceedings of the 2013 Interconnection Network Architecture: On-Chip, Multi-Chip
High and stable performance under adverse traffic patterns of tori-connected torus network
Computers and Electrical Engineering
Validation and uncertainty assessment of extreme-scale HPC simulation through bayesian inference
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Performance implications of remote-only load balancing under adversarial traffic in Dragonflies
Proceedings of the 8th International Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip
A divide and conquer strategy for scaling weather simulations with multiple regions of interest
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
A low-diameter, fast interconnection network is going to be a prerequisite for building exascale machines. A two-level direct network has been proposed by several groups as a scalable design for future machines. IBM's PERCS topology and the dragonfly network discussed in the DARPA exascale hardware study are examples of this design. The presence of multiple levels in this design leads to hot-spots on a few links when processes are grouped together at the lowest level to minimize total communication volume. This is especially true for communication graphs with a small number of neighbors per task. Routing and mapping choices can impact the communication performance of parallel applications running on a machine with a two-level direct topology. This paper explores intelligent topology aware mappings of different communication patterns to the physical topology to identify cases that minimize link utilization. We also analyze the trade-offs between using direct and indirect routing with different mappings. We use simulations to study communication and overall performance of applications since there are no installations of two-level direct networks yet. This study raises interesting issues regarding the choice of job scheduling, routing and mapping for future machines.