Interleaving and lock-step semantics for analysis and verification of GPU kernels

Authors:
Peter Collingbourne;Alastair F. Donaldson;Jeroen Ketema;Shaz Qadeer
Affiliations:
Imperial College London, UK;Imperial College London, UK;Imperial College London, UK;Microsoft Research
Venue:
ESOP'13 Proceedings of the 22nd European conference on Programming Languages and Systems
Year:
2013

Citing 15
Cited 3

Space-Time Trade-Offs in Structured Programming: An Improved Combinatorial Embedding Theorem

Journal of the ACM (JACM)
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Weakest-precondition of unstructured programs

PASTE '05 Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Compilers: Principles, Techniques, and Tools (2nd Edition)

Compilers: Principles, Techniques, and Tools (2nd Edition)
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Accelerating the local outlier factor algorithm on a GPU for intrusion detection systems

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Z3: an efficient SMT solver

TACAS'08/ETAPS'08 Proceedings of the Theory and practice of software, 14th international conference on Tools and algorithms for the construction and analysis of systems
Scalable SMT-based verification of GPU kernel functions

Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Boogie: a modular reusable verifier for object-oriented programs

FMCO'05 Proceedings of the 4th international conference on Formal Methods for Components and Objects
GKLEE: concolic verification and test generation for GPUs

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
A GPU-based high-throughput image retrieval algorithm

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Verifying GPU kernels by test amplification

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
On the correctness of the SIMT execution model of GPUs

ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
GPUVerify: a verifier for GPU kernels

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Symbolic testing of OpenCL code

HVC'11 Proceedings of the 7th international Haifa Verification conference on Hardware and Software: verification and testing

Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
A sound and complete abstraction for reasoning about parallel prefix sums

Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Race directed scheduling of concurrent programs

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study semantics of GPU kernels -- the parallel programs that run on Graphics Processing Units (GPUs). We provide a novel lock-step execution semantics for GPU kernels represented by arbitrary reducible control flow graphs and compare this semantics with a traditional interleaving semantics. We show for terminating kernels that either both semantics compute identical results or both behave erroneously. The result induces a method that allows GPU kernels with arbitrary reducible control flow graphs to be verified via transformation to a sequential program that employs predicated execution. We implemented this method in the GPUVerify tool and experimentally evaluated it by comparing the tool with the previous version of the tool based on a similar method for structured programs, i.e., where control is organised using if and while statements. The evaluation was based on a set of 163 open source and commercial GPU kernels. Among these kernels, 42 exhibit unstructured control flow which our novel method can handle fully automatically, but the previous method could not. Overall the generality of the new method comes at a modest price: Verification across our benchmark set was 2.25 times slower overall; however, the median slow down across all kernels was 0.77, indicating that our novel technique yielded faster analysis in many cases.