Fine-grained resource sharing for concurrent GPGPU kernels

Authors:
Chris Gregg;Jonathan Dorn;Kim Hazelwood;Kevin Skadron
Affiliations:
Department of Computer Science, University of Virginia, Charlottesville, VA;Department of Computer Science, University of Virginia, Charlottesville, VA;Department of Computer Science, University of Virginia, Charlottesville, VA;Department of Computer Science, University of Virginia, Charlottesville, VA
Venue:
HotPar'12 Proceedings of the 4th USENIX conference on Hot Topics in Parallelism
Year:
2012

Citing 3
Cited 2

Kernel Fusion: An Effective Method for Better Power Efficiency on Multithreaded GPU

GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
GPU Resource Sharing and Virtualization on High Performance Computing Systems

ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms

Computer Methods and Programs in Biomedicine

Improving GPGPU concurrency with elastic kernels

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Application-aware Memory System for Fair and Efficient Execution of Concurrent GPGPU Applications

Proceedings of Workshop on General Purpose Processing Using GPUs

Quantified Score

Hi-index	0.00

Visualization

Abstract

General purpose GPU (GPGPU) programming frameworks such as OpenCL and CUDA allow running individual computation kernels sequentially on a device. However, in some cases it is possible to utilize device resources more efficiently by running kernels concurrently. This raises questions about load balancing and resource allocation that have not previously warranted investigation. For example, what kernel characteristics impact the optimal partitioning of resources among concurrently executing kernels? Current frameworks do not provide the ability to easily run kernels concurrently withne-grained and dynamic control over resource partitioning. We present KernelMerge, a kernel scheduler that runs two OpenCL kernels concurrently on one device. KernelMerge furnishes a number of settings that can be used to survey concurrent or single kernel configurations, and to investigate how kernels interact and influence each other, or themselves. KernelMerge provides a concurrent kernel scheduler compatible with the OpenCL API. We present an argument on the benefits of running kernels concurrently. We demonstrate how to use KernelMerge to increase throughput for two kernels that efficiently use device resources when run concurrently, and we establish that some kernels show worse performance when running concurrently. We also outline a method for using KernelMerge to investigate how concurrent kernels influence each other, with the goal of predicting runtimes for concurrent execution from individual kernel runtimes. Finally, we suggest GPU architectural changes that would improve such concurrent schedulers in the future.