Transputer-based experiments with the ZAPP architecture
Volume I: Parallel architectures on PARLE: Parallel Architectures and Languages Europe
Lazy task creation: a technique for increasing the granularity of parallel programs
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
Using the run-time sizes of data structures to guide parallel-thread creation
LFP '94 Proceedings of the 1994 ACM conference on LISP and functional programming
High-Performance parallel graph reduction
PARLE '89 Proceedings of the Parallel Architectures and Languages Europe, Volume I: Parallel Architectures
Experience with the Implementation of a Concurrent Graph Reduction System on an nCube/2 Platform
CONPAR 94 - VAPP VI Proceedings of the Third Joint International Conference on Vector and Parallel Processing: Parallel Processing
Multiprocessor execution of functional programs
Multiprocessor execution of functional programs
Generalized parallel divide and conquer on 3D mesh and torus
Journal of Systems Architecture: the EUROMICRO Journal
An adaptive cut-off for task parallelism
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
An adaptive task creation strategy for work-stealing scheduling
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Evaluation of OpenMP task scheduling strategies
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Work-stealing without the baggage
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Adaptive granularity control in task parallel programs using multiversioning
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
This paper studies the runtime behaviour of various parallel divide-and-conquer algorithms written in a nonstrict functional language, when three common granularity control mechanisms are used: a simple cut-off, a priority thread creation and a priority scheduling mechanism. These mechanisms use granularity information that is currently provided via annotations to improve the performance of the parallel programs. The programs we examine are several variants of a generic divide-and-conquer program, an unbalanced divide and-conquer algorithm and a parallel determinant computation. Our results indicate that for balanced computation trees a simple, low-overhead mechanism performs well whereas the more complex mechanisms offer further improvements for unbalanced computation trees.