A Heuristic for Suffix Solutions
IEEE Transactions on Computers
Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Depth-size trade-offs for parallel prefix computation
Journal of Algorithms
Faster optimal parallel prefix sums and list ranking
Information and Computation
Optimal schedules for parallel prefix computation with bounded resources
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Highly parallel computing
Parallel computing using the prefix problem
Parallel computing using the prefix problem
Journal of the ACM (JACM)
Structure of Computers and Computations
Structure of Computers and Computations
New bounds for parallel prefix circuits
STOC '83 Proceedings of the fifteenth annual ACM symposium on Theory of computing
Parallelism exposure and exploitation in programs
Parallelism exposure and exploitation in programs
Parallelization of programs containing loop-carried dependences with resource constraints
Parallelization of programs containing loop-carried dependences with resource constraints
A New Class of Depth-Size Optimal Parallel Prefix Circuits
The Journal of Supercomputing
The design, implementation and initial evaluation of an advanced knowledge-based process scheduler
ACM SIGOPS Operating Systems Review
Symbolic algebra and timing driven data-flow synthesis
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Constructing H4, a Fast Depth-Size Optimal Parallel Prefix Circuit
The Journal of Supercomputing
Z4: a new depth-size optimal parallel prefix circuit with small depth
Neural, Parallel & Scientific Computations
A new approach to constructing optimal parallel prefix circuits with small depth
Journal of Parallel and Distributed Computing
Faster optimal parallel prefix circuits: New algorithmic construction
Journal of Parallel and Distributed Computing
Reconfigurable hardware solution to parallel prefix computation
The Journal of Supercomputing
Computation-efficient parallel prefix
AIC'06 Proceedings of the 6th WSEAS International Conference on Applied Informatics and Communications
Two families of parallel prefix algorithms for multicomputers
TELE-INFO'08 Proceedings of the 7th WSEAS International Conference on Telecommunications and Informatics
Straightforward construction of depth-size optimal, parallel prefix circuits with fan-out 2
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Parallel prefix algorithms on the multicomputer
WSEAS Transactions on Computer Research
Fast problem-size-independent parallel prefix circuits
Journal of Parallel and Distributed Computing
New parallel prefix algorithms
AIC'09 Proceedings of the 9th WSEAS international conference on Applied informatics and communications
New families of computation-efficient parallel prefix algorithms
WSEAS Transactions on Computers
On-line adaptive parallel prefix computation
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Hi-index | 14.98 |
Prefix computation is a basic operation at the core of many important applications, e.g., some of the Grand Challenge problems, circuit design, digital signal processing, graph optimizations, and computational geometry.1 In this paper, we present new and strict time-optimal parallel schedules for prefix computation with resource constraints under the concurrent-read-exclusive-write (CREW) parallel random access machine (PRAM) model. For prefix of N elements on p processors (p independent of N) when N p(p + 1)/2, we derive Harmonic Schedules that achieve the strict optimal time (steps), $\left\lceil {{{2\left( {N-1} \right)} \mathord{\left/ {\vphantom {{2\left( {N-1} \right)} {\left( {p+1} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {p+1} \right)}}} \right\rceil $. We also derive Pipelined Schedules that have better program-space efficiency than the Harmonic Schedule, yet only require a small constant number of steps more than the optimal time achieved by the Harmonic Schedule. Both the Harmonic Schedules and the Pipelined Schedules are simple and easy to implement. For prefix of N elements on p processors (p independent of N) where N驴p(p + 1)/2, the Harmonic Schedules are not time-optimal. For these cases, we establish an optimization method for determining key parameters of time-optimal schedules, based on connections between the structure of parallel prefix and Pascal's triangle. Using the derived parameters, we devise an algorithm to construct such schedules. For a restricted class of values of N and p, we prove that the constructed schedules are strictly time-optimal. We also give strong empirical evidence that our algorithm constructs strict time-optimal schedules for all cases where N驴p(p + 1)/2.