Optimal parallel evaluation of tree-structured computations by raking (extended abstract)
VLSI Algorithms and Architectures
A simple parallel tree contraction algorithm
Journal of Algorithms
Introduction to algorithms
Parallel tree contraction part 2: further applications
SIAM Journal on Computing
Highly parallel computing (2nd ed.)
Highly parallel computing (2nd ed.)
Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract)
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
VLSI Architecture: Past, Present, and Future
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Towards a first vertical prototyping of an extremely fine-grained parallel programming approach
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Experiments with list ranking for explicit multi-threaded (XMT) instruction parallelism
Journal of Experimental Algorithmics (JEA)
Evaluating the XMT Parallel Programming Model
HIPS '01 Proceedings of the 6th International Workshop on High-Level Parallel Programming Models and Supportive Environments
Efficient Implementation of Tree Accumulations on Distributed-Memory Parallel Computers
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part II
Hi-index | 0.00 |
Suppose that a parallel algorithm can include any number of parallel threads. Each thread can proceed without ever having to busy wait to another thread. A thread can proceed till its termination, but no new threads can be formed. What kind of problems can such restrictive algorithms solve and still be competitive in the total number of operations they perform with the fastest serial algorithm for the same problem?Intrigued by this informal question, we considered one of the most elementary parallel algorithmic paradigms, that of balanced binary trees. The main contribution of this paper is a new balanced (not necessarily binary) tree no-busy-wait paradigm for parallel algorithms; applications of the basic paradigm to two problems are presented: building heaps, and executing parallel tree contraction (assuming a preparatory stage); the latter is known to be applicable to evaluating a family of general arithmetic expressions.For putting things in context, we also discuss our “PRAM-on-chip” vision (actually a small update to it), presented at SPAA98.