On the correctness of the SIMT execution model of GPUs

Authors:
Axel Habermaier;Alexander Knapp
Affiliations:
Institute for Software and Systems Engineering, University of Augsburg, Germany;Institute for Software and Systems Engineering, University of Augsburg, Germany
Venue:
ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
Year:
2012

Citing 12
Cited 3

Advanced compiler design and implementation

Advanced compiler design and implementation
Theories of programming languages

Theories of programming languages
Chap - a SIMD graphics processor

SIGGRAPH '84 Proceedings of the 11th annual conference on Computer graphics and interactive techniques
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Future Graphics Architectures

Queue - GPU Computing
Parallel Computing Experiences with CUDA

IEEE Micro
CUDA Accelerated LTL Model Checking

ICPADS '09 Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems
The GPU Computing Era

IEEE Micro
Dynamic warp subdivision for integrated branch and memory divergence tolerance

Proceedings of the 37th annual international symposium on Computer architecture
GPU-PRISM: An Extension of PRISM for General Purpose Graphics Processing Units

PDMC-HIBI '10 Proceedings of the 2010 Ninth International Workshop on Parallel and Distributed Methods in Verification, and Second International Workshop on High Performance Computational Systems Biology
Computer Architecture, Fifth Edition: A Quantitative Approach

Computer Architecture, Fifth Edition: A Quantitative Approach
Thread block compaction for efficient SIMT control flow

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture

GPUVerify: a verifier for GPU kernels

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Interleaving and lock-step semantics for analysis and verification of GPU kernels

ESOP'13 Proceedings of the 22nd European conference on Programming Languages and Systems
A sound and complete abstraction for reasoning about parallel prefix sums

Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages

Quantified Score

Hi-index	0.00

Visualization

Abstract

GPUs are becoming a primary resource of computing power. They use a single instruction, multiple threads (SIMT) execution model that executes batches of threads in lockstep. If the control flow of threads within the same batch diverges, the different execution paths are scheduled sequentially; once the control flows reconverge, all threads are executed in lockstep again. Several thread batching mechanisms have been proposed, albeit without establishing their semantic validity or their scheduling properties. To increase the level of confidence in the correctness of GPU-accelerated programs, we formalize the SIMT execution model for a stack-based reconvergence mechanism in an operational semantics and prove its correctness by constructing a simulation between the SIMT semantics and a standard interleaved multi-thread semantics. We also demonstrate that the SIMT execution model produces unfair schedules in some cases. We discuss the problem of unfairness for different batching mechanisms like dynamic warp formation and a stack-less reconvergence strategy.