Space-Time Trade-Offs in Structured Programming: An Improved Combinatorial Embedding Theorem
Journal of the ACM (JACM)
Conversion of control dependence to data dependence
POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Weakest-precondition of unstructured programs
PASTE '05 Proceedings of the 6th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Compilers: Principles, Techniques, and Tools (2nd Edition)
Compilers: Principles, Techniques, and Tools (2nd Edition)
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Accelerating the local outlier factor algorithm on a GPU for intrusion detection systems
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
TACAS'08/ETAPS'08 Proceedings of the Theory and practice of software, 14th international conference on Tools and algorithms for the construction and analysis of systems
Scalable SMT-based verification of GPU kernel functions
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Boogie: a modular reusable verifier for object-oriented programs
FMCO'05 Proceedings of the 4th international conference on Formal Methods for Components and Objects
GKLEE: concolic verification and test generation for GPUs
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
A GPU-based high-throughput image retrieval algorithm
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Verifying GPU kernels by test amplification
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
On the correctness of the SIMT execution model of GPUs
ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
GPUVerify: a verifier for GPU kernels
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Symbolic testing of OpenCL code
HVC'11 Proceedings of the 7th international Haifa Verification conference on Hardware and Software: verification and testing
Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
A sound and complete abstraction for reasoning about parallel prefix sums
Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Race directed scheduling of concurrent programs
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
We study semantics of GPU kernels -- the parallel programs that run on Graphics Processing Units (GPUs). We provide a novel lock-step execution semantics for GPU kernels represented by arbitrary reducible control flow graphs and compare this semantics with a traditional interleaving semantics. We show for terminating kernels that either both semantics compute identical results or both behave erroneously. The result induces a method that allows GPU kernels with arbitrary reducible control flow graphs to be verified via transformation to a sequential program that employs predicated execution. We implemented this method in the GPUVerify tool and experimentally evaluated it by comparing the tool with the previous version of the tool based on a similar method for structured programs, i.e., where control is organised using if and while statements. The evaluation was based on a set of 163 open source and commercial GPU kernels. Among these kernels, 42 exhibit unstructured control flow which our novel method can handle fully automatically, but the previous method could not. Overall the generality of the new method comes at a modest price: Verification across our benchmark set was 2.25 times slower overall; however, the median slow down across all kernels was 0.77, indicating that our novel technique yielded faster analysis in many cases.