Deterministic parallel random-number generation for dynamic-multithreading platforms

Authors:
Charles E. Leiserson;Tao B. Schardl;Jim Sukha
Affiliations:
MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA;MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA;MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, USA
Venue:
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Year:
2012

Citing 28
Cited 4

What are race conditions?: Some issues and formalizations

ACM Letters on Programming Languages and Systems (LOPLAS)
Randomized algorithms

Randomized algorithms
Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator

ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special issue on uniform random number generation
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Scheduling multithreaded computations by work stealing

Journal of the ACM (JACM)
A Java fork/join framework

Proceedings of the ACM 2000 conference on Java Grande
Algorithm 806: SPRNG: a scalable library for pseudorandom number generation

ACM Transactions on Mathematical Software (TOMS)
Java Language Specification, Second Edition: The Java Series

Java Language Specification, Second Edition: The Java Series
UMAC: Fast and Secure Message Authentication

CRYPTO '99 Proceedings of the 19th Annual International Cryptology Conference on Advances in Cryptology
Polynomial Hash Functions Are Reliable (Extended Abstract)

ICALP '92 Proceedings of the 19th International Colloquium on Automata, Languages and Programming
Implementation of multilisp: Lisp on a multiprocessor

LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Universal classes of hash functions (Extended Abstract)

STOC '77 Proceedings of the ninth annual ACM symposium on Theory of computing
Executing functional programs on a virtual tree of processors

FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
Tabulation based 4-universal hashing with applications to second moment estimation

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Windows System Programming (3rd Edition)

Windows System Programming (3rd Edition)
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
The Problem with Threads

Computer
Probabilistic calling context

Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Intel threading building blocks

Intel threading building blocks
Introduction to Algorithms, Third Edition

Introduction to Algorithms, Third Edition
The habanero multicore software research project

Proceedings of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications
The Cilk++ concurrency platform

The Journal of Supercomputing
Pseudo-random trees in Monte Carlo

Parallel Computing
Parallel programming must be deterministic by default

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
The power of simple tabulation hashing

Proceedings of the forty-third annual ACM symposium on Theory of computing
Parallel random numbers: as easy as 1, 2, 3

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Habanero-Java: the new adventures of old X10

Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
Internally deterministic parallel algorithms can be fast

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming

Internally deterministic parallel algorithms can be fast

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Array dataflow analysis for polyhedral X10 programs

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
LVars: lattice-based data structures for deterministic parallelism

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Splittable pseudorandom number generators using cryptographic hashing

Proceedings of the 2013 ACM SIGPLAN symposium on Haskell

Quantified Score

Hi-index	0.00

Visualization

Abstract

Existing concurrency platforms for dynamic multithreading do not provide repeatable parallel random-number generators. This paper proposes that a mechanism called pedigrees be built into the runtime system to enable efficient deterministic parallel random-number generation. Experiments with the open-source MIT Cilk runtime system show that the overhead for maintaining pedigrees is negligible. Specifically, on a suite of 10 benchmarks, the relative overhead of Cilk with pedigrees to the original Cilk has a geometric mean of less than 1%. We persuaded Intel to modify its commercial C/C++ compiler, which provides the Cilk Plus concurrency platform, to include pedigrees, and we built a library implementation of a deterministic parallel random-number generator called DotMix that compresses the pedigree and then "RC6-mixes" the result. The statistical quality of DotMix is comparable to that of the popular Mersenne twister, but somewhat slower than a nondeterministic parallel version of this efficient and high-quality serial random-number generator. The cost of calling DotMix depends on the "spawn depth" of the invocation. For a naive Fibonacci calculation with n=40 that calls DotMix in every node of the computation, this "price of determinism" is a factor of 2.65 in running time, but for more realistic applications with less intense use of random numbers -- such as a maximal-independent-set algorithm, a practical samplesort program, and a Monte Carlo discrete-hedging application from QuantLib -- the observed "price" was less than 5%. Moreover, even if overheads were several times greater, applications using DotMix should be amply fast for debugging purposes, which is a major reason for desiring repeatability.