Internally deterministic parallel algorithms can be fast

Authors:
Guy E. Blelloch;Jeremy T. Fineman;Phillip B. Gibbons;Julian Shun
Affiliations:
Carnegie Mellon University, Pittsburgh, USA;Georgetown University, Washington, D.C., USA;Intel Labs, Pittsburgh, Pittsburgh, USA;Carnegie Mellon University, Pittsburgh, USA
Venue:
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Year:
2012

Citing 43
Cited 10

MULTILISP: a language for concurrent symbolic computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Commutativity-Based Concurrency Control for Abstract Data Types

IEEE Transactions on Computers
A more practical PRAM model

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Heuristics for ray tracing using space subdivision

The Visual Computer: International Journal of Computer Graphics
Making asynchronous parallelism safe for the world

POPL '90 Proceedings of the 17th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
What are race conditions?: Some issues and formalizations

ACM Letters on Programming Languages and Systems (LOPLAS)
A decomposition of multidimensional point sets with applications to k-nearest-neighbors and n-body potential fields

Journal of the ACM (JACM)
Programming parallel algorithms

Communications of the ACM
A provable time and space efficient implementation of NESL

Proceedings of the first ACM SIGPLAN international conference on Functional programming
Cilk: an efficient multithreaded runtime system

Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Commutativity analysis: a new analysis technique for parallelizing compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Detecting data races in Cilk programs that use locks

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Weak ordering—a new definition

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Memory consistency and event ordering in scalable shared-memory multiprocessors

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
Introduction to Algorithms

Introduction to Algorithms
Cooperating Sequential Processes, Technical Report EWD-123

Cooperating Sequential Processes, Technical Report EWD-123
Strongly History-Independent Hashing with Applications

FOCS '07 Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science
Closure properties of interconnections of determinate systems

Record of the Project MAC conference on concurrent systems and parallel computation
Transactional boosting: a methodology for highly-concurrent transactional objects

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Computational Geometry: Algorithms and Applications

Computational Geometry: Algorithms and Applications
DMP: deterministic shared memory multiprocessing

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Kendo: efficient deterministic multithreading in software

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
A case for an interleaving constrained shared-memory multi-processor

Proceedings of the 36th annual international symposium on Computer architecture
Grace: safe multithreaded programming for C/C++

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
CoreDet: a compiler and runtime system for deterministic multithreaded execution

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
The Cilk++ concurrency platform

The Journal of Supercomputing
Simple linear work suffix array construction

ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Low depth cache-oblivious algorithms

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers)

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Parallel programming must be deterministic by default

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Parallel SAH k-D tree construction

Proceedings of the Conference on High Performance Graphics
Deterministic process groups in dOS

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Safe nondeterminism in a deterministic-by-default parallel language

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
RCDC: a relaxed consistency deterministic computer

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Commutative set: a language extension for implicit parallel programming

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
The tao of parallelism in algorithms

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Exploiting the commutativity lattice

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Calvin: Deterministic or not? Free will to choose

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Deterministic parallel random-number generation for dynamic-multithreading platforms

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
MCSTL: the multi-core standard template library

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Deterministic parallel random-number generation for dynamic-multithreading platforms

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Brief announcement: the problem based benchmark suite

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Parallel and I/O efficient set covering algorithms

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Greedy sequential maximal independent set and matching are parallel on average

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Scheduling parallel programs by work stealing with private deques

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Reducing contention through priority updates

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Atomic-free irregular computations on GPUs

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Reducing contention through priority updates

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Program-centric cost models for locality

Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Deterministic galois: on-demand, portable and parameterless

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The virtues of deterministic parallelism have been argued for decades and many forms of deterministic parallelism have been described and analyzed. Here we are concerned with one of the strongest forms, requiring that for any input there is a unique dependence graph representing a trace of the computation annotated with every operation and value. This has been referred to as internal determinism, and implies a sequential semantics---i.e., considering any sequential traversal of the dependence graph is sufficient for analyzing the correctness of the code. In addition to returning deterministic results, internal determinism has many advantages including ease of reasoning about the code, ease of verifying correctness, ease of debugging, ease of defining invariants, ease of defining good coverage for testing, and ease of formally, informally and experimentally reasoning about performance. On the other hand one needs to consider the possible downsides of determinism, which might include making algorithms (i) more complicated, unnatural or special purpose and/or (ii) slower or less scalable. In this paper we study the effectiveness of this strong form of determinism through a broad set of benchmark problems. Our main contribution is to demonstrate that for this wide body of problems, there exist efficient internally deterministic algorithms, and moreover that these algorithms are natural to reason about and not complicated to code. We leverage an approach to determinism suggested by Steele (1990), which is to use nested parallelism with commutative operations. Our algorithms apply several diverse programming paradigms that fit within the model including (i) a strict functional style (no shared state among concurrent operations), (ii) an approach we refer to as deterministic reservations, and (iii) the use of commutative, linearizable operations on data structures. We describe algorithms for the benchmark problems that use these deterministic approaches and present performance results on a 32-core machine. Perhaps surprisingly, for all problems, our internally deterministic algorithms achieve good speedup and good performance even relative to prior nondeterministic solutions.