Scans as Primitive Parallel Operations
IEEE Transactions on Computers
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Journal of the ACM (JACM)
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Types and programming languages
Types and programming languages
POPL '77 Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Scan primitives for GPU computing
Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Parallel Processing with the Perfect Shuffle
IEEE Transactions on Computers
A Regular Layout for Parallel Adders
IEEE Transactions on Computers
Much ado about two (pearl): a pearl on parallel prefix computation
Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Efficient stream compaction on wide SIMD many-core architectures
Proceedings of the Conference on High Performance Graphics 2009
Designing efficient sorting algorithms for manycore GPUs
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations
IEEE Transactions on Computers
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
The Scalable Heterogeneous Computing (SHOC) benchmark suite
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Scalable SMT-based verification of GPU kernel functions
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Functional and dynamic programming in the design of parallel prefix networks
Journal of Functional Programming
GKLEE: concolic verification and test generation for GPUs
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Verifying GPU kernels by test amplification
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
On the correctness of the SIMT execution model of GPUs
ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
GPUVerify: a verifier for GPU kernels
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Symbolic testing of OpenCL code
HVC'11 Proceedings of the 7th international Haifa Verification conference on Hardware and Software: verification and testing
Interleaving and lock-step semantics for analysis and verification of GPU kernels
ESOP'13 Proceedings of the 22nd European conference on Programming Languages and Systems
Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Hi-index | 0.00 |
Prefix sums are key building blocks in the implementation of many concurrent software applications, and recently much work has gone into efficiently implementing prefix sums to run on massively parallel graphics processing units (GPUs). Because they lie at the heart of many GPU-accelerated applications, the correctness of prefix sum implementations is of prime importance. We introduce a novel abstraction, the interval of summations, that allows scalable reasoning about implementations of prefix sums. We present this abstraction as a monoid, and prove a soundness and completeness result showing that a generic sequential prefix sum implementation is correct for an array of length $n$ if and only if it computes the correct result for a specific test case when instantiated with the interval of summations monoid. This allows correctness to be established by running a single test where the input and result require O(n lg(n)) space. This improves upon an existing result by Sheeran where the input requires O(n lg(n)) space and the result O(n2 \lg(n)) space, and is more feasible for large n than a method by Voigtlaender that uses O(n) space for the input and result but requires running O(n2) tests. We then extend our abstraction and results to the context of data-parallel programs, developing an automated verification method for GPU implementations of prefix sums. Our method uses static verification to prove that a generic prefix sum implementation is data race-free, after which functional correctness of the implementation can be determined by running a single test case under the interval of summations abstraction. We present an experimental evaluation using four different prefix sum algorithms, showing that our method is highly automatic, scales to large thread counts, and significantly outperforms Voigtlaender's method when applied to large arrays.