Faster optimal parallel prefix sums and list ranking
Information and Computation
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
The parallel complexity of integer prefix summation
Information Processing Letters
Parallel computation: models and methods
Parallel computation: models and methods
Z4: a new depth-size optimal parallel prefix circuit with small depth
Neural, Parallel & Scientific Computations
A new approach to constructing optimal parallel prefix circuits with small depth
Journal of Parallel and Distributed Computing
Multiple Addition and Prefix Sum on a Linear Array with a Reconfigurable Pipelined Bus System
The Journal of Supercomputing
High-Speed Parallel-Prefix VLSI Ling Adders
IEEE Transactions on Computers
Faster optimal parallel prefix circuits: New algorithmic construction
Journal of Parallel and Distributed Computing
On the construction of zero-deficiency parallel prefix circuits with minimum depth
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Fast problem-size-independent parallel prefix circuits
Journal of Parallel and Distributed Computing
A Class of an Almost-Optimal Size-Independent Parallel Prefix Circuits
IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Hi-index | 0.00 |
Prefix computation is one of the fundamental problems that can be used in many applications such as fast adders. Most proposed parallel prefix circuits assume that the circuit is of the same width as the input size. In this paper, we present a class of parallel prefix circuits that perform well when the input size, n, is more than the width of the circuit, m. That is, the proposed circuit is almost optimal in speed when nm. Specifically, we derive a lower bound for the depth of the circuit, and prove that the circuit requires one time step more than the optimal number of time steps needed to generate its first output. We also show that the size of the circuit is optimal within one. The input is divided into subsets, each of width m-1, and presented to the circuit in subsequent time steps. The circuit is compared with functionally similar circuits of (Lin, 1999 [9]; Lin and Hung, 2009 [12]) to show its outperforming speed. When nm, the circuit is shown to be faster than the family of waist-size optimal circuits with waist 1 (WSO-1) of the same width and fan-out.