Towards efficient GPU sharing on multicore processors

Authors:
Lingyuan Wang;Miaoqing Huang;Tarek El-Ghazawi
Affiliations:
The George Washington University, Washington, DC, USA;University of Arkansas, Fayetteville, AR, USA;The George Washington University, Washington, DC, USA
Venue:
Proceedings of the second international workshop on Performance modeling, benchmarking and simulation of high performance computing systems
Year:
2011

Citing 3
Cited 1

UPC performance and potential: a NPB experimental study

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Scaling scientific applications on clusters of hybrid multicore/GPU nodes

Proceedings of the 8th ACM International Conference on Computing Frontiers
Towards efficient GPU sharing on multicore processors

ACM SIGMETRICS Performance Evaluation Review

A shared matrix unit for a chip multi-core processor

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scalable systems employing a mix of GPUs with CPUs are becoming increasingly prevalent in high-performance computing (HPC). The presence of such accelerators introduces significant challenges and complexities to both language developers and end users. This paper provides a close study of efficient coordination mechanisms to handle parallel requests from multiple hosts of control to a GPU under hybrid programming. Using a set of microbenchmarks and applications on a GPU cluster, we show that thread- and process-based context hosting have different tradeoffs. Experimental results on application benchmarks suggest that both thread-based context funneling and process-based context switching natively perform similarly on the latest Fermi GPU, while manually guided context funneling is currently the best way to achieve optimal performance.