A New Class of Depth-Size Optimal Parallel Prefix Circuits

Authors:
Yen-Chun Lin;Chao-Cheng Shih
Affiliations:
Department of Electronic Engineering, National Taiwan University of Science and Technology, P.O. Box 90-100, Taipei 106, Taiwan yclin@et.ntust.edu.tw;Department of Electronic Engineering, National Taiwan University of Science and Technology, P.O. Box 90-100, Taipei 106, Taiwan
Venue:
The Journal of Supercomputing
Year:
1999

Citing 11
Cited 13

Principles of CMOS VLSI design: a systems perspective

Principles of CMOS VLSI design: a systems perspective
Depth-size trade-offs for parallel prefix computation

Journal of Algorithms
Faster optimal parallel prefix sums and list ranking

Information and Computation
Scans as Primitive Parallel Operations

IEEE Transactions on Computers
Limited width parallel prefix circuits

The Journal of Supercomputing
Optimal schedules for parallel prefix computation with bounded resources

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallel computing using the prefix problem

Parallel computing using the prefix problem
The Strict Time Lower Bound and Optimal Schedules for Parallel Prefix with Resource Constraints

IEEE Transactions on Computers
Parallel computation: models and methods

Parallel computation: models and methods
Parallel Prefix Computation

Journal of the ACM (JACM)
New bounds for parallel prefix circuits

STOC '83 Proceedings of the fifteenth annual ACM symposium on Theory of computing

Constructing H4, a Fast Depth-Size Optimal Parallel Prefix Circuit

The Journal of Supercomputing
Z4: a new depth-size optimal parallel prefix circuit with small depth

Neural, Parallel & Scientific Computations
A new approach to constructing optimal parallel prefix circuits with small depth

Journal of Parallel and Distributed Computing
Constructing zero-deficiency parallel prefix adder of minimum depth

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Faster optimal parallel prefix circuits: New algorithmic construction

Journal of Parallel and Distributed Computing
On the construction of zero-deficiency parallel prefix circuits with minimum depth

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Computation-efficient parallel prefix

AIC'06 Proceedings of the 6th WSEAS International Conference on Applied Informatics and Communications
Two families of parallel prefix algorithms for multicomputers

TELE-INFO'08 Proceedings of the 7th WSEAS International Conference on Telecommunications and Informatics
Straightforward construction of depth-size optimal, parallel prefix circuits with fan-out 2

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Parallel prefix algorithms on the multicomputer

WSEAS Transactions on Computer Research
New parallel prefix algorithms

AIC'09 Proceedings of the 9th WSEAS international conference on Applied informatics and communications
New families of computation-efficient parallel prefix algorithms

WSEAS Transactions on Computers
A novel hybrid parallel-prefix adder architecture with efficient timing-area characteristic

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given n values x_1, x_2, …, x_n and an associative binary operation o, the prefix problem is to compute x_1 o x_2 o ··· o x_i, 1≤i≤n. Many combinational circuits for solving the prefix problem, called prefix circuits, have been designed. It has been proved that the size s(D(n)) and the depth d(D(n)) of an n-input prefix circuit D(n) satisfy the inequality d(D(n))+s(D(n))≥2n−2; thus, a prefix circuit is depth-size optimal if d(D(n))+s(D(n))=2n−2. In this paper, we construct a new depth-size optimal prefix circuit SL(n). In addition, we can build depth-size optimal prefix circuits whose depth can be any integer between d(SL(n)) and n−1. SL(n) has the same maximum fan-out ⌈lg n⌉+1 as Snir's SN(n), but the depth of SL(n) is smaller; thus, SL(n) is faster. Compared with another optimal prefix circuit LYD(n), d(LYD(n))+2≥d(SL(n))≥d(LYD(n)). However, LYD(n) may have a fan-out of at most 2 ⌈lg n⌉−2, and the fan-out of LYD(n) is greater than that of SL(n) for almost all n≥12. Because an operation node with greater fan-out occupies more chip area and is slower in VLSI implementation, in most cases, SL(n) needs less area and may be faster than LYD(n). Moreover, it is much easier to design SL(n) than LYD(n).