Enabling a highly-scalable global address space model for petascale computing

Authors:
Vinod Tipparaju;Edoardo Aprá;Weikuan Yu;Jeffrey S. Vetter
Affiliations:
Oak Ridge National Laboratory, Oak Ridge, TN, USA;Oak Ridge National Laboratory, Oak Ridge, TN, USA;Auburn University, Auburn, AL, USA;Oak Ridge National Laboratory, Oak Ridge, TN, USA
Venue:
Proceedings of the 7th ACM international conference on Computing frontiers
Year:
2010

Citing 11
Cited 3

Portals 3.0: Protocol Building Blocks for Low Overhead Communication

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Protocols and Strategies for Optimizing Performance of Remote Memory Operations on Clusters

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
GASNet Specification, v1.1

GASNet Specification, v1.1
Generalized portable shmem library for high performance computing

Generalized portable shmem library for high performance computing
A Multi-Platform Co-Array Fortran Compiler

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
An Evaluation of Two Implementation Strategies for Optimizing One-Sided Atomic Reduction

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Optimizing Strided Remote Memory Access Operations on the Quadrics QsNetII Network Interconnect

HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit

International Journal of High Performance Computing Applications
High Performance Remote Memory Access Communication: The Armci Approach

International Journal of High Performance Computing Applications
Liquid water: obtaining the right answer for the right reasons

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Analysis of implementation options for MPI-2 one-sided

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface

GA-GPU: extending a library-based global address spaceprogramming model for scalable heterogeneouscomputing systems

Proceedings of the 9th conference on Computing Frontiers
Performance characterization of global address space applications: a case study with NWChem

Concurrency and Computation: Practice & Experience
HiCOO: Hierarchical cooperation for scalable communication in Global Address Space programming models on Cray XT systems

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Over the past decade, the trajectory to the petascale has been built on increased complexity and scale of the underlying parallel architectures. Meanwhile, software developers have struggled to provide tools that maintain the productivity of computational science teams using these new systems. In this regard, Global Address Space (GAS) programming models provide a straightforward and easy to use addressing model, which can lead to improved productivity. However, the scalability of GAS depends directly on the design and implementation of the runtime system on the target petascale distributed-memory architecture. In this paper, we describe the design, implementation, and optimization of the Aggregate Remote Memory Copy Interface (ARMCI) runtime library on the Cray XT5 2.3 PetaFLOPs computer at Oak Ridge National Laboratory. We optimized our implementation with the flow intimation technique that we have introduced in this paper. Our optimized ARMCI implementation improves scalability of both the Global Arrays (GA) programming model and a real-world chemistry application - NWChem - from small jobs up through 180,000 cores.