Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Tight bounds on expected time to add correctly and add mostly correctly
Information Processing Letters
An evaluation of asynchronous addition
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
On the Time Required to Perform Addition
Journal of the ACM (JACM)
Journal of the ACM (JACM)
Communicating sequential processes
Communications of the ACM
A CMOS VLSI Implementation of an Asynchronous ALU
Proceedings of the IFIP WG10.5 Working Conference on Asynchronous Design Methodologies
Constructing H4, a Fast Depth-Size Optimal Parallel Prefix Circuit
The Journal of Supercomputing
Z4: a new depth-size optimal parallel prefix circuit with small depth
Neural, Parallel & Scientific Computations
A new approach to constructing optimal parallel prefix circuits with small depth
Journal of Parallel and Distributed Computing
Faster optimal parallel prefix circuits: New algorithmic construction
Journal of Parallel and Distributed Computing
Computation-efficient parallel prefix
AIC'06 Proceedings of the 6th WSEAS International Conference on Applied Informatics and Communications
Two families of parallel prefix algorithms for multicomputers
TELE-INFO'08 Proceedings of the 7th WSEAS International Conference on Telecommunications and Informatics
Straightforward construction of depth-size optimal, parallel prefix circuits with fan-out 2
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Parallel prefix algorithms on the multicomputer
WSEAS Transactions on Computer Research
Fast problem-size-independent parallel prefix circuits
Journal of Parallel and Distributed Computing
New parallel prefix algorithms
AIC'09 Proceedings of the 9th WSEAS international conference on Applied informatics and communications
New families of computation-efficient parallel prefix algorithms
WSEAS Transactions on Computers
Hi-index | 14.98 |
The prefix problem is to compute all the products $x_1 \otimes x_2 \otimes \cdots \otimes x_k,$ for 1 驴k驴n, where $\otimes$ is an associative binary operation. We start with an asynchronous circuit to solve this problem with O(log n) latency and O(n log n) circuit size, with $O(n)\ \otimes\!\!-{\rm operations}$ in the circuit. Our contributions are: 1) a modification to the circuit that improves its average-case latency from O(log n) to O(log log n) time, and 2) a further modification that allows the circuit to run at full-throughput, i.e., with constant response time. The construction can be used to obtain a asynchronous adder with O(log n) worst-case latency and O(log log n) average-case latency.