HiCOO: Hierarchical cooperation for scalable communication in Global Address Space programming models on Cray XT systems

Authors:
Weikuan Yu;Xinyu Que;Vinod Tipparaju;Jeffrey S. Vetter
Affiliations:
Department of Computer Science, Auburn University, AL 36849, USA;Department of Computer Science, Auburn University, AL 36849, USA;Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA;Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
Venue:
Journal of Parallel and Distributed Computing
Year:
2012

Citing 20
Cited 0

Deadlock-Free Message Routing in Multiprocessor Interconnection Networks

IEEE Transactions on Computers
Performance Analysis of k-ary n-cube Interconnection Networks

IEEE Transactions on Computers
Deadlock-free multicast wormhole routing in multicomputer networks

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Introduction to parallel algorithms and architectures: array, trees, hypercubes

Introduction to parallel algorithms and architectures: array, trees, hypercubes
The turn model for adaptive routing

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
NAS parallel benchmark results

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Interconnection Networks: An Engineering Approach

Interconnection Networks: An Engineering Approach
A Theory of Deadlock-Free Adaptive Multicast Routing in Wormhole Networks

IEEE Transactions on Parallel and Distributed Systems
k -ary n -trees: High Performance Networks for Massively Parallel Architectures

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
A Multi-Platform Co-Array Fortran Compiler

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Communication Optimizations for Fine-Grained UPC Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
High Performance Remote Memory Access Communication: The Armci Approach

International Journal of High Performance Computing Applications
High-performance and scalable MPI over InfiniBand with reduced memory usage: an in-depth performance analysis

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Reducing Connection Memory Requirements of MPI for InfiniBand Clusters: A Message Coalescing Approach

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Overview of the IBM Blue Gene/P project

IBM Journal of Research and Development
Early evaluation of IBM BlueGene/P

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Liquid water: obtaining the right answer for the right reasons

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Reevaluating Amdahl's law in the multicore era

Journal of Parallel and Distributed Computing
Enabling a highly-scalable global address space model for petascale computing

Proceedings of the 7th ACM international conference on Computing frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Global Address Space (GAS) programming models enable a convenient, shared-memory style addressing model. Typically this is realized through one-sided operations that can enable asynchronous communication and data movement. With the size of petascale systems reaching 10,000s of nodes and 100,000s of cores, the underlying runtime systems face critical challenges in (1) scalably managing resources (such as memory for communication buffers), and (2) gracefully handling unpredictable communication patterns and any associated contention. For any solution that addresses these resource scalability challenges, equally important is the need to maintain the performance of GAS programming models. In this paper, we describe a Hierarchical COOperation (HiCOO) architecture for scalable communication in GAS programming models. HiCOO formulates a cooperative communication architecture: with inter-node cooperation amongst multiple nodes (a.k.a multinode) and hierarchical cooperation among multinodes that are arranged in various virtual topologies. We have implemented HiCOO for a popular GAS runtime library, Aggregate Remote Memory Copy Interface (ARMCI). By extensively evaluating different virtual topologies in HiCOO in terms of their impact to memory scalability, network contention, and application performance, we identify MFCG as the most suitable virtual topology. The resulting HiCOO architecture is able to realize scalable resource management and achieve resilience to network contention, while at the same time maintaining or enhancing the performance of scientific applications. In one case, it reduces the total execution time of an NWChem application by 52%.