GA-GPU: extending a library-based global address spaceprogramming model for scalable heterogeneouscomputing systems

Authors:
Vinod Tipparaju;Jeffrey S. Vetter
Affiliations:
AMD, Austin, TX, USA;ORNL, Oak Ridge, TN, USA
Venue:
Proceedings of the 9th conference on Computing Frontiers
Year:
2012

Citing 21
Cited 0

A correctness condition for high-performance multiprocessors (extended abstract)

STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
Location Consistency-A New Memory Model and Cache Consistency Protocol

IEEE Transactions on Computers
A taxonomy of programming models for symmetric multiprocessors and SMP clusters

PMMP '95 Proceedings of the conference on Programming Models for Massively Parallel Computers
Evaluating support for global address space languages on the Cray X1

Proceedings of the 18th annual international conference on Supercomputing
Advances, Applications and Performance of the Global Arrays Shared Memory Programming Toolkit

International Journal of High Performance Computing Applications
High Performance Remote Memory Access Communication: The Armci Approach

International Journal of High Performance Computing Applications
A theory of memory models

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

IEEE Transactions on Computers
A parallel algorithm for accurate dot product

Parallel Computing
Accelerating linpack with CUDA on heterogenous clusters

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Non-collective parallel I/O for global address space programming models

CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Fast Conjugate Gradients with Multiple GPUs

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
Auto-tuning 3-D FFT library for CUDA GPUs

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Implementing the PGI Accelerator model

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Enabling a highly-scalable global address space model for petascale computing

Proceedings of the 7th ACM international conference on Computing frontiers
Petascale Direct Numerical Simulation of Blood Flow on 200K Cores and Heterogeneous Architectures

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
OpenMPC: Extended OpenMP Programming and Tuning for GPUs

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Maestro: data orchestration and tuning for OpenCL devices

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
A scalable high performant Cholesky factorization for multicore with GPU accelerators

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Keeneland: Bringing Heterogeneous GPU Computing to the Computational Science Community

Computing in Science and Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scalable heterogeneous computing (SHC) architectures are emerging as a response to new requirements for low cost, power efficiency, and high performance. For example, numerous contemporary HPC systems are using commodity Graphical Processing Units (GPU) to supplement traditional multicore processors. Yet scientists still face a number of challenges in utilizing SHC systems. First and foremost, they are forced to combine a number of programming models and then delicately optimize the data movement among these multiple programming systems on each architecture. In this paper, we investigate a new programming model for SHC systems that attempts to unify data access to the aggregate memory available in GPUs in the system. In particular, we extend the popular and easy to use Global Address Space (GAS) programming model to SHC systems. We explore multiple implementation options, and demonstrate our solution in the context of Global Arrays, a library based GAS model. We then evaluate these options in the context of kernels and applications, such as a scalable chemistry application: NWChem. Our results reveal that GA-GPU can offer considerable benefit to users in terms of programmability, and both our empirical results and performance model provide encouraging performance benefits for future systems that offer a tightly integrated memory system.