Advanced compiler design and implementation
Advanced compiler design and implementation
Theories of programming languages
Theories of programming languages
Chap - a SIMD graphics processor
SIGGRAPH '84 Proceedings of the 11th annual conference on Computer graphics and interactive techniques
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Queue - GPU Computing
Parallel Computing Experiences with CUDA
IEEE Micro
CUDA Accelerated LTL Model Checking
ICPADS '09 Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems
IEEE Micro
Dynamic warp subdivision for integrated branch and memory divergence tolerance
Proceedings of the 37th annual international symposium on Computer architecture
GPU-PRISM: An Extension of PRISM for General Purpose Graphics Processing Units
PDMC-HIBI '10 Proceedings of the 2010 Ninth International Workshop on Parallel and Distributed Methods in Verification, and Second International Workshop on High Performance Computational Systems Biology
Computer Architecture, Fifth Edition: A Quantitative Approach
Computer Architecture, Fifth Edition: A Quantitative Approach
Thread block compaction for efficient SIMT control flow
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
GPUVerify: a verifier for GPU kernels
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Interleaving and lock-step semantics for analysis and verification of GPU kernels
ESOP'13 Proceedings of the 22nd European conference on Programming Languages and Systems
A sound and complete abstraction for reasoning about parallel prefix sums
Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Hi-index | 0.00 |
GPUs are becoming a primary resource of computing power. They use a single instruction, multiple threads (SIMT) execution model that executes batches of threads in lockstep. If the control flow of threads within the same batch diverges, the different execution paths are scheduled sequentially; once the control flows reconverge, all threads are executed in lockstep again. Several thread batching mechanisms have been proposed, albeit without establishing their semantic validity or their scheduling properties. To increase the level of confidence in the correctness of GPU-accelerated programs, we formalize the SIMT execution model for a stack-based reconvergence mechanism in an operational semantics and prove its correctness by constructing a simulation between the SIMT semantics and a standard interleaved multi-thread semantics. We also demonstrate that the SIMT execution model produces unfair schedules in some cases. We discuss the problem of unfairness for different batching mechanisms like dynamic warp formation and a stack-less reconvergence strategy.