A sound and complete abstraction for reasoning about parallel prefix sums

Authors:
Nathan Chong;Alastair F. Donaldson;Jeroen Ketema
Affiliations:
Imperial College London, London, United Kingdom;Imperial College London, London, United Kingdom;Imperial College London, London, United Kingdom
Venue:
Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Year:
2014

Citing 26
Cited 0

Scans as Primitive Parallel Operations

IEEE Transactions on Computers
A parallel method for fast and practical high-order Newton interpolation

BIT
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Parallel Prefix Computation

Journal of the ACM (JACM)
Using MPI (2nd ed.): portable parallel programming with the message-passing interface

Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Types and programming languages

Types and programming languages
Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints

POPL '77 Proceedings of the 4th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Scan primitives for GPU computing

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Parallel Processing with the Perfect Shuffle

IEEE Transactions on Computers
A Regular Layout for Parallel Adders

IEEE Transactions on Computers
Much ado about two (pearl): a pearl on parallel prefix computation

Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Efficient stream compaction on wide SIMD many-core architectures

Proceedings of the Conference on High Performance Graphics 2009
Designing efficient sorting algorithms for manycore GPUs

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations

IEEE Transactions on Computers
Rodinia: A benchmark suite for heterogeneous computing

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
The Scalable Heterogeneous Computing (SHOC) benchmark suite

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
KLEE: unassisted and automatic generation of high-coverage tests for complex systems programs

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Scalable SMT-based verification of GPU kernel functions

Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Functional and dynamic programming in the design of parallel prefix networks

Journal of Functional Programming
GKLEE: concolic verification and test generation for GPUs

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Verifying GPU kernels by test amplification

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
On the correctness of the SIMT execution model of GPUs

ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
GPUVerify: a verifier for GPU kernels

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Symbolic testing of OpenCL code

HVC'11 Proceedings of the 7th international Haifa Verification conference on Hardware and Software: verification and testing
Interleaving and lock-step semantics for analysis and verification of GPU kernels

ESOP'13 Proceedings of the 22nd European conference on Programming Languages and Systems
Barrier invariants: a shared state abstraction for the analysis of data-dependent GPU kernels

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Prefix sums are key building blocks in the implementation of many concurrent software applications, and recently much work has gone into efficiently implementing prefix sums to run on massively parallel graphics processing units (GPUs). Because they lie at the heart of many GPU-accelerated applications, the correctness of prefix sum implementations is of prime importance. We introduce a novel abstraction, the interval of summations, that allows scalable reasoning about implementations of prefix sums. We present this abstraction as a monoid, and prove a soundness and completeness result showing that a generic sequential prefix sum implementation is correct for an array of length $n$ if and only if it computes the correct result for a specific test case when instantiated with the interval of summations monoid. This allows correctness to be established by running a single test where the input and result require O(n lg(n)) space. This improves upon an existing result by Sheeran where the input requires O(n lg(n)) space and the result O(n2 \lg(n)) space, and is more feasible for large n than a method by Voigtlaender that uses O(n) space for the input and result but requires running O(n2) tests. We then extend our abstraction and results to the context of data-parallel programs, developing an automated verification method for GPU implementations of prefix sums. Our method uses static verification to prove that a generic prefix sum implementation is data race-free, after which functional correctness of the implementation can be determined by running a single test case under the interval of summations abstraction. We present an experimental evaluation using four different prefix sum algorithms, showing that our method is highly automatic, scales to large thread counts, and significantly outperforms Voigtlaender's method when applied to large arrays.