MULTILISP: a language for concurrent symbolic computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Resource requirements of dataflow programs
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
I-structures: data structures for parallel computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Mul-T: a high-performance parallel Lisp
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Introduction to algorithms
A report on the Sisal language project
Journal of Parallel and Distributed Computing - Special issue: data-flow processing
Implementation of a portable nested data-parallel language
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Randomized algorithms
Provably efficient scheduling for languages with fine-grained parallelism
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
An analysis of dag-consistent distributed shared-memory algorithms
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
A provably time-efficient parallel implementation of full speculation
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Guaranteeing Good Memory Bounds for Parallel Programs
IEEE Transactions on Software Engineering
Executing multithreaded programs efficiently
Executing multithreaded programs efficiently
Space-efficient scheduling of parallelism with synchronization variables
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Space-efficient implementation of nested parallelism
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Provably efficient scheduling for languages with fine-grained parallelism
Journal of the ACM (JACM)
Scheduling threads for low space requirement and good locality
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
Pthreads for dynamic and irregular parallelism
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Compositional C++: Compositional Parallel Programming
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Prioritization in Parallel Symbolic Computing
Proceedings of the US/Japan Workshop on Parallel Symbolic Computing: Languages, Systems, and Applications
Space-efficient scheduling for parallel, multithreaded computations
Space-efficient scheduling for parallel, multithreaded computations
Effectively sharing a cache among threads
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Hi-index | 0.00 |
We present an efficient, randomized, online, scheduling algorithm for a large class of programs with write-once synchronization variables. The algorithm combines the work-stealing paradigm with the depth-first scheduling technique, resulting in high space efficiency and good time complexity. By automatically increasing the granularity of the work scheduled on each processor, our algorithm achieves good locality, low contention and low scheduling overhead, improving upon a previous depth-first scheduling algorithm [6] published in SPAA '97. Moreover, it is probably efficient for the general class of multithreaded computations with write-once synchronization variables (as studied in [6]), improving upon algorithm DFDeques (published in SPAA'99 [24]), which is only for the more restricted class of nested parallel computations.More specifically, consider such a computation with work T 1, depth T ∞ and &sgr; synchronizations, and suppose that space S 1 suffices to execute the computation on a single-processor computer. Then, on a P-processor shared-memory parallel machine, the expected space complexity of our algorithm is at most S 1 + &Ogr;(PT∞ log (PT∞)), and its expected time complexity is &Ogr;(T 1/P + &sgr; log (PT∞)/P + T∞ log (PT∞)). Moreover, for any ∈ 0, the space complexity of our algorithm is S 1 + &Ogr;(P(T∞ + ln(1/∈)) log(P(T∞ + ln(P(T∞(1/∈))/∈)))) with probability at least 1 — ∈. Thus, even for values of ∈ as small ∈ —T∞, the space complexity of our algorithm is at most S 1 + &Ogr;(PT∞ log(PT∞)) with probability at least 1 — ∈ —T∞. These bounds include all time and space costs for both the computation and the scheduler.