The Strict Time Lower Bound and Optimal Schedules for Parallel Prefix with Resource Constraints

Authors:
Haigeng Wang;Alexandru Nicolau;Kai-Yeng S. Siu
Affiliations:
-;-;-
Venue:
IEEE Transactions on Computers
Year:
1996

Citing 13
Cited 16

A Heuristic for Suffix Solutions

IEEE Transactions on Computers
Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Depth-size trade-offs for parallel prefix computation

Journal of Algorithms
Faster optimal parallel prefix sums and list ranking

Information and Computation
Optimal schedules for parallel prefix computation with bounded resources

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Introduction to parallel algorithms and architectures: array, trees, hypercubes

Introduction to parallel algorithms and architectures: array, trees, hypercubes
Highly parallel computing

Highly parallel computing
Parallel computing using the prefix problem

Parallel computing using the prefix problem
Parallel Prefix Computation

Journal of the ACM (JACM)
Structure of Computers and Computations

Structure of Computers and Computations
New bounds for parallel prefix circuits

STOC '83 Proceedings of the fifteenth annual ACM symposium on Theory of computing
Parallelism exposure and exploitation in programs

Parallelism exposure and exploitation in programs
Parallelization of programs containing loop-carried dependences with resource constraints

Parallelization of programs containing loop-carried dependences with resource constraints

A New Class of Depth-Size Optimal Parallel Prefix Circuits

The Journal of Supercomputing
The design, implementation and initial evaluation of an advanced knowledge-based process scheduler

ACM SIGOPS Operating Systems Review
Symbolic algebra and timing driven data-flow synthesis

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Constructing H4, a Fast Depth-Size Optimal Parallel Prefix Circuit

The Journal of Supercomputing
Z4: a new depth-size optimal parallel prefix circuit with small depth

Neural, Parallel & Scientific Computations
A new approach to constructing optimal parallel prefix circuits with small depth

Journal of Parallel and Distributed Computing
Faster optimal parallel prefix circuits: New algorithmic construction

Journal of Parallel and Distributed Computing
Reconfigurable hardware solution to parallel prefix computation

The Journal of Supercomputing
Computation-efficient parallel prefix

AIC'06 Proceedings of the 6th WSEAS International Conference on Applied Informatics and Communications
Two families of parallel prefix algorithms for multicomputers

TELE-INFO'08 Proceedings of the 7th WSEAS International Conference on Telecommunications and Informatics
Straightforward construction of depth-size optimal, parallel prefix circuits with fan-out 2

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Parallel prefix algorithms on the multicomputer

WSEAS Transactions on Computer Research
Fast problem-size-independent parallel prefix circuits

Journal of Parallel and Distributed Computing
New parallel prefix algorithms

AIC'09 Proceedings of the 9th WSEAS international conference on Applied informatics and communications
New families of computation-efficient parallel prefix algorithms

WSEAS Transactions on Computers
On-line adaptive parallel prefix computation

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing

Quantified Score

Hi-index	14.98

Visualization

Abstract

Prefix computation is a basic operation at the core of many important applications, e.g., some of the Grand Challenge problems, circuit design, digital signal processing, graph optimizations, and computational geometry.1 In this paper, we present new and strict time-optimal parallel schedules for prefix computation with resource constraints under the concurrent-read-exclusive-write (CREW) parallel random access machine (PRAM) model. For prefix of N elements on p processors (p independent of N) when N p(p + 1)/2, we derive Harmonic Schedules that achieve the strict optimal time (steps), $\left\lceil {{{2\left( {N-1} \right)} \mathord{\left/ {\vphantom {{2\left( {N-1} \right)} {\left( {p+1} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {p+1} \right)}}} \right\rceil $. We also derive Pipelined Schedules that have better program-space efficiency than the Harmonic Schedule, yet only require a small constant number of steps more than the optimal time achieved by the Harmonic Schedule. Both the Harmonic Schedules and the Pipelined Schedules are simple and easy to implement. For prefix of N elements on p processors (p independent of N) where N驴p(p + 1)/2, the Harmonic Schedules are not time-optimal. For these cases, we establish an optimization method for determining key parameters of time-optimal schedules, based on connections between the structure of parallel prefix and Pascal's triangle. Using the derived parameters, we devise an algorithm to construct such schedules. For a restricted class of values of N and p, we prove that the constructed schedules are strictly time-optimal. We also give strong empirical evidence that our algorithm constructs strict time-optimal schedules for all cases where N驴p(p + 1)/2.