Scalable node allocation for improved performance in regular and anisotropic 3D torus supercomputers

Authors:
Carl Albing;Norm Troullier;Stephen Whalen;Ryan Olson;Joe Glenski;Howard Pritchard;Hugo Mills
Affiliations:
University of Reading, Reading, Berkshire, UK and Cray Inc., Saint Paul, MN;Cray Inc., Saint Paul, MN;Cray Inc., Saint Paul, MN and University of Minnesota, Minneapolis, MN;Cray Inc., Saint Paul, MN;Cray Inc., Saint Paul, MN;Cray Inc., Saint Paul, MN;University of Reading, Reading, Berkshire, UK
Venue:
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Year:
2011

Citing 8
Cited 1

Noncontiguous Processor Allocation Algorithms for Mesh-Connected Multicomputers

IEEE Transactions on Parallel and Distributed Systems
A comparison of next-fit, first-fit, and best-fit

Communications of the ACM
Job Scheduling for the BlueGene/L System

JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
Processor Allocation on Cplant: Achieving General Processor Locality Using One-Dimensional Allocation Strategies

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Topology mapping for Blue Gene/L supercomputer

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
An efficient non-contiguous processor allocation strategy for 2D mesh connected multicomputers

Information Sciences: an International Journal
An evaluative study on the effect of contention on message latencies in large supercomputers

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Topology-aware task mapping for reducing communication contention on large parallel machines

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Analysis of topology-dependent MPI performance on Gemini networks

Proceedings of the 20th European MPI Users' Group Meeting

Quantified Score

Hi-index	0.00

Visualization

Abstract

MPI application performance can vary based on the scheduler's placing of ranks, whether between nodes or on cores in the same multi-core chip. MPI applications, by default, are at the mercy of the application placement software decision that assigns nodes to a job. We describe herein the general approach of node ordering for allocation in a 3D torus, how it improved MPI application performance, even in the face of an anisotropic interconnect. We demonstrate, quantitatively, that our topologically-based ordering results in improved performance for several MPI applications running on a Top10 supercomputer.