Designing a unified programming model for heterogeneous machines

Authors:
Michael Garland;Manjunath Kudlur;Yili Zheng
Affiliations:
NVIDIA;NVIDIA;Lawrence Berkeley National Lab
Venue:
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Year:
2012

Citing 19
Cited 3

Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
GASNet Specification, v1.1

GASNet Specification, v1.1
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
TreadMarks: distributed shared memory on standard workstations and operating systems

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Parallel Languages and Compilers: Perspective From the Titanium Experience

International Journal of High Performance Computing Applications
Parallel Programmability and the Chapel Language

International Journal of High Performance Computing Applications
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
The Cilk++ concurrency platform

Proceedings of the 46th Annual Design Automation Conference
Heterogeneous multicore parallel programming for graphics processing units

Scientific Programming - Software Development for Multi-core Computing Systems
ParalleX An Advanced Parallel Execution Model for Scaling-Impaired Applications

ICPPW '09 Proceedings of the 2009 International Conference on Parallel Processing Workshops
Implementing the PGI Accelerator model

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Understanding throughput-oriented architectures

Communications of the ACM
hiCUDA: High-Level GPGPU Programming

IEEE Transactions on Parallel and Distributed Systems
Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
GPUs and the Future of Parallel Computing

IEEE Micro
Habanero-Java: the new adventures of old X10

Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
Hierarchical place trees: a portable abstraction for task parallelism and data movement

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing

Accelerating simulation of agent-based models on heterogeneous architectures

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Portable mapping of openMP to multicore embedded systems using MCA APIs

Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Semi-automatic restructuring of offloadable tasks for many-core accelerators

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

While high-efficiency machines are increasingly embracing heterogeneous architectures and massive multithreading, contemporary mainstream programming languages reflect a mental model in which processing elements are homogeneous, concurrency is limited, and memory is a flat undifferentiated pool of storage. Moreover, the current state of the art in programming heterogeneous machines tends towards using separate programming models, such as OpenMP and CUDA, for different portions of the machine. Both of these factors make programming emerging heterogeneous machines unnecessarily difficult. We describe the design of the Phalanx programming model, which seeks to provide a unified programming model for heterogeneous machines. It provides constructs for bulk parallelism, synchronization, and data placement which operate across the entire machine. Our prototype implementation is able to launch and coordinate work on both CPU and GPU processors within a single node, and by leveraging the GASNet runtime, is able to run across all the nodes of a distributed-memory machine.