Global arrays: a portable "shared-memory" programming model for distributed memory computers

Authors:
Jaroslaw Nieplocha;Robert J. Harrison;Richard J. Littlefield
Affiliations:
Pacific Northwest Laboratory, Richland WA;Pacific Northwest Laboratory, Richland WA;Pacific Northwest Laboratory, Richland WA
Venue:
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Year:
1994

Citing 5
Cited 51

HEP SISAL: parallel functional programming

on Parallel MIMD computation: HEP supercomputer and its applications
How to write parallel programs: a first course

How to write parallel programs: a first course
Micro benchmark analysis of the KSR1

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Distributed data parallel coupled-cluster algorithm: application to the 2-hydroxypyridine/2-pyridone tautomerism

Journal of Computational Chemistry
Efficient Barriers for Distributed Shared Memory Computers

Proceedings of the 8th International Symposium on Parallel Processing

Optimizing collective I/O performance on parallel computers: a multisystem study

ICS '97 Proceedings of the 11th international conference on Supercomputing
Performance implications of communication mechanisms in all-software global address space systems

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Simplification of array access patterns for compiler optimizations

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Shared Memory Programming in Metacomputing Environments: The Global Array Approach

The Journal of Supercomputing - Special issue: high performance distributed computing
An Advanced Compiler Framework for Non-Cache-Coherent Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Emulating Shared Memory to Simplify Distributed-Memory Programming

IEEE Computational Science & Engineering
Protocols and Strategies for Optimizing Performance of Remote Memory Operations on Clusters

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Performance Benefits of NIC-Based Barrier on Myrinet/GM

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Multiprotocol Communication Support for the Global Address Space Programming Model on the IBM SP

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Computational Grids

VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Distributed dynamic hash tables using IBM LAPI

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Disk Resident Arrays: An Array-Oriented I/O Library for Out-Of-Core Computations

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Shared Memory NUMA Programming on I-WAY

HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Performance and Experience with LAPI -- A New High-Performance Communication Library for the IBM RS/6000 SP

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Compiler Techniques for the Distribution of Data and Computation

IEEE Transactions on Parallel and Distributed Systems
References

Sourcebook of parallel computing
Load balancing of molecular dynamics simulation with NWChem

IBM Systems Journal - Deep computing for the life sciences
SIMBEX: a portal for the a priori simulation of crossed beam experiments

Future Generation Computer Systems - Special issue: Computational chemistry and molecular dynamics
Early Evaluation of the Cray X1

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Enabling the Efficient Use of SMP Clusters: The GAMESS/DDI Model

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Distributed Shared Arrays: An Integration of Message Passing and Multithreading on SMP Clusters

The Journal of Supercomputing
Performance Evaluation of the Cray X1 Distributed Shared-Memory Architecture

IEEE Micro
Parallelization of the NAS Conjugate Gradient Benchmark Using the Global Arrays Shared Memory Programming Model

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 4 - Volume 05
An Evaluation of Two Implementation Strategies for Optimizing One-Sided Atomic Reduction

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
High-Performance Computational Chemistry: Hartree-Fock Electronic Structure Calculations on Massively Parallel Processors

International Journal of High Performance Computing Applications
Multilevel Parallelism in Computational Chemistry using Common Component Architecture and Global Arrays

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Programming for parallelism and locality with hierarchically tiled arrays

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit

International Journal of High Performance Computing Applications
High Performance Remote Memory Access Communication: The Armci Approach

International Journal of High Performance Computing Applications
Layout transformation support for the disk resident arrays framework

The Journal of Supercomputing
Systems research challenges: a scale-out perspective

IBM Journal of Research and Development
Experiences with component-oriented technologies in nuclear power plant simulators

Software—Practice & Experience
Using the GA and TAO toolkits for solving large-scale optimization problems on parallel computers

ACM Transactions on Mathematical Software (TOMS)
Scalable work stealing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
An asymmetric distributed shared memory model for heterogeneous parallel systems

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Improving the performance of OpenMP by array privatization

WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Hybrid parallel programming with MPI and unified parallel C

Proceedings of the 7th ACM international conference on Computing frontiers
Selective Recovery from Failures in a Task Parallel Programming Model

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Service Oriented Approach to High Performance Scientific Computing

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
An extensible global address space framework with decoupled task and data abstractions

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
An approach to locality-conscious load balancing and transparent memory hierarchy management with a global- address-space parallel programming model

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Extensible PGAS semantics for C++

Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Scalable implementations of accurate excited-state coupled cluster theories: application of high-level methods to porphyrin-based systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Designing a common communication subsystem

PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
A survey of high-quality computational libraries and their impact in science and engineering applications

VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
Empirical performance-model driven data layout optimization

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Efficient layout transformation for disk-based multidimensional arrays

HiPC'04 Proceedings of the 11th international conference on High Performance Computing
A CCA-compliant nuclear power plant simulator kernel

CBSE'05 Proceedings of the 8th international conference on Component-Based Software Engineering
Performance characterization of global address space applications: a case study with NWChem

Concurrency and Computation: Practice & Experience
CUDASA: compute unified device and systems architecture

EG PGV'08 Proceedings of the 8th Eurographics conference on Parallel Graphics and Visualization
Inspector/executor load balancing algorithms for block-sparse tensor contractions

Proceedings of the 27th international ACM conference on International conference on supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Portability, efficiency, and ease of coding are all important considerations in choosing the programming model for a scalable parallel application. The message-passing programming model is widely used because of its portability, yet some applications are too complex to code in it while also trying to maintain a balanced computation load and avoid redundant computations. The shared-memory programming model simplifies coding, but it is not portable and often provides little control over interprocessor data transfer costs. This paper describes a new approach, called Global Arrays (GA), that combines the better features of both other models, leading to both simple coding and efficient execution. The key concept of GA is that it provides a portable interface through which each process in a MIMD parallel program can asynchronously access logical blocks of physically distributed matrices, with no need for explicit cooperation by other processes. We have implemented GA libraries on a variety of computer systems, including the Intel DELTA and Paragon, the IBM SP-1 (all message-passers), the Kendall Square KSR-2 (a nonuniform access shared-memory machine), and networks of Unix workstations. We discuss the design and implementation of these libraries, report their performance, illustrate the use of GA in the context of computational chemistry applications, and describe the use of a GA performance visualization tool.