Low-contention depth-first scheduling of parallel computations with write-once synchronization variables

Authors:
Panagiota Fatourou
Affiliations:
Department of Computer Science, University of Toronto
Venue:
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Year:
2001

Citing 25
Cited 1

MULTILISP: a language for concurrent symbolic computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Resource requirements of dataflow programs

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
I-structures: data structures for parallel computing

ACM Transactions on Programming Languages and Systems (TOPLAS)
Mul-T: a high-performance parallel Lisp

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Introduction to algorithms

Introduction to algorithms
A report on the Sisal language project

Journal of Parallel and Distributed Computing - Special issue: data-flow processing
Implementation of a portable nested data-parallel language

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Randomized algorithms

Randomized algorithms
Provably efficient scheduling for languages with fine-grained parallelism

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
An analysis of dag-consistent distributed shared-memory algorithms

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
A provably time-efficient parallel implementation of full speculation

POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Guaranteeing Good Memory Bounds for Parallel Programs

IEEE Transactions on Software Engineering
Executing multithreaded programs efficiently

Executing multithreaded programs efficiently
Space-efficient scheduling of parallelism with synchronization variables

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Space-efficient implementation of nested parallelism

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Provably efficient scheduling for languages with fine-grained parallelism

Journal of the ACM (JACM)
Scheduling threads for low space requirement and good locality

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Scheduling multithreaded computations by work stealing

Journal of the ACM (JACM)
Pthreads for dynamic and irregular parallelism

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Jade: A High-Level, Machine-Independent Language for Parallel Programming

Computer
COOL: An Object-Based Language for Parallel Programming

Computer
Compositional C++: Compositional Parallel Programming

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Prioritization in Parallel Symbolic Computing

Proceedings of the US/Japan Workshop on Parallel Symbolic Computing: Languages, Systems, and Applications
Space-efficient scheduling for parallel, multithreaded computations

Space-efficient scheduling for parallel, multithreaded computations

Effectively sharing a cache among threads

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an efficient, randomized, online, scheduling algorithm for a large class of programs with write-once synchronization variables. The algorithm combines the work-stealing paradigm with the depth-first scheduling technique, resulting in high space efficiency and good time complexity. By automatically increasing the granularity of the work scheduled on each processor, our algorithm achieves good locality, low contention and low scheduling overhead, improving upon a previous depth-first scheduling algorithm [6] published in SPAA '97. Moreover, it is probably efficient for the general class of multithreaded computations with write-once synchronization variables (as studied in [6]), improving upon algorithm DFDeques (published in SPAA'99 [24]), which is only for the more restricted class of nested parallel computations.More specifically, consider such a computation with work T 1, depth T ∞ and &sgr; synchronizations, and suppose that space S 1 suffices to execute the computation on a single-processor computer. Then, on a P-processor shared-memory parallel machine, the expected space complexity of our algorithm is at most S 1 + &Ogr;(PT∞ log (PT∞)), and its expected time complexity is &Ogr;(T 1/P + &sgr; log (PT∞)/P + T∞ log (PT∞)). Moreover, for any ∈ 0, the space complexity of our algorithm is S 1 + &Ogr;(P(T∞ + ln(1/∈)) log(P(T∞ + ln(P(T∞(1/∈))/∈)))) with probability at least 1 — ∈. Thus, even for values of ∈ as small ∈ —T∞, the space complexity of our algorithm is at most S 1 + &Ogr;(PT∞ log(PT∞)) with probability at least 1 — ∈ —T∞. These bounds include all time and space costs for both the computation and the scheduler.