Efficient scheduling of recursive control flow on GPUs

Authors:
Xin Huo;Sriram Krishnamoorthy;Gagan Agrawal
Affiliations:
The Ohio State University, Columbus, OH, USA;Pacific Northwest National Laboratory, Richland, WA, USA;The Ohio State University, Columbus, OH, USA
Venue:
Proceedings of the 27th international ACM conference on International conference on supercomputing
Year:
2013

Citing 16
Cited 1

Textbook examples of recursion

Artificial intelligence and mathematical theory of computation
Translation of serial recursive codes to parallel SIMD codes

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms

The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
A SIMD Vectorizing Compiler for Digital Signal Processing Algorithms

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Chap - a SIMD graphics processor

SIGGRAPH '84 Proceedings of the 11th annual conference on Computer graphics and interactive techniques
Vectorization of Multigrid Codes Using SIMD ISA Extensions

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
RPU: a programmable ray processing unit for realtime ray tracing

ACM SIGGRAPH 2005 Papers
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Stack-based parallel recursion on graphics processors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP

ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Dynamic warp subdivision for integrated branch and memory divergence tolerance

Proceedings of the 37th annual international symposium on Computer architecture
Thread block compaction for efficient SIMT control flow

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Improving GPU performance via large warps and two-level warp scheduling

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
SIMD re-convergence at thread frontiers

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
CPU-assisted GPGPU on fused CPU-GPU architectures

HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Simultaneous branch and warp interweaving for sustained GPU performance

Proceedings of the 39th Annual International Symposium on Computer Architecture

General transformations for GPU execution of tree traversals

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphics processing units (GPUs) have rapidly emerged as a very significant player in high performance computing. Single instruction multiple thread (SIMT) pipelines are typically used in GPUs to exploit parallelism and maximize performance. Although support for unstructured control flow has been included in GPUs, efficiently managing thread divergence for arbitrary parallel programs remains a critical challenge. In this paper, we focus on the problem of supporting recursion in modern GPUs. We design and comparatively evaluate various algorithms to manage thread divergence encountered in recursive programs. The results improve upon traditional post-dominator based reconvergence mechanisms designed to handle thread divergence due to control flow within a procedure.