Principles of CMOS VLSI design: a systems perspective
Principles of CMOS VLSI design: a systems perspective
Scans as Primitive Parallel Operations
IEEE Transactions on Computers
IEEE Transactions on Computers
Parallel Computations on Reconfigurable Meshes
IEEE Transactions on Computers
Parallel computing using the prefix problem
Parallel computing using the prefix problem
Reconfigurable Buses with Shift Switching: Concepts and Applications
IEEE Transactions on Parallel and Distributed Systems
Low-power design techniques for high-performance CMOS adders
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Data communications, computer networks and open systems (4th ed.)
Data communications, computer networks and open systems (4th ed.)
The Design of an Optoelectronic Arithmetic Processor Based on Permutation Networks
IEEE Transactions on Computers
Parallel computation: models and methods
Parallel computation: models and methods
An Efficient Algorithm for Row Minima Computations on Basic Reconfigurable Meshes
IEEE Transactions on Parallel and Distributed Systems
Computer arithmetic: algorithms and hardware designs
Computer arithmetic: algorithms and hardware designs
Journal of the ACM (JACM)
Contemporary Logic Design
Computer Arithmetic
Digital Computer Arithmetic
IEEE Transactions on Computers
IEEE Transactions on Parallel and Distributed Systems
Performance Driven Synthesis for Pass-Transistor Logic
VLSID '99 Proceedings of the 12th International Conference on VLSI Design - 'VLSI for the Information Appliance'
Computational Aspects of VLSI
Z4: a new depth-size optimal parallel prefix circuit with small depth
Neural, Parallel & Scientific Computations
A new approach to constructing optimal parallel prefix circuits with small depth
Journal of Parallel and Distributed Computing
Faster optimal parallel prefix circuits: New algorithmic construction
Journal of Parallel and Distributed Computing
Computation-efficient parallel prefix
AIC'06 Proceedings of the 6th WSEAS International Conference on Applied Informatics and Communications
Two families of parallel prefix algorithms for multicomputers
TELE-INFO'08 Proceedings of the 7th WSEAS International Conference on Telecommunications and Informatics
Straightforward construction of depth-size optimal, parallel prefix circuits with fan-out 2
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Parallel prefix algorithms on the multicomputer
WSEAS Transactions on Computer Research
Fast problem-size-independent parallel prefix circuits
Journal of Parallel and Distributed Computing
New parallel prefix algorithms
AIC'09 Proceedings of the 9th WSEAS international conference on Applied informatics and communications
New families of computation-efficient parallel prefix algorithms
WSEAS Transactions on Computers
Hi-index | 0.00 |
In this work, we address the problem of designing efficient and scalable hardware-algorithms for computing the sum and prefix sums of a $w^k\hbox{-}{\rm{bit}}$, $(k\geq 2)$, sequence using as basic building blocks linear arrays of at most $w^2$ shift switches, where $w$ is a small power of $2$. An immediate consequence of this feature is that in our designs broadcasts are limited to buses of length at most $w^2$. We adopt a VLSI delay model where the 驴length驴 of a bus is proportional with the number of devices on the bus. We begin by discussing a hardware-algorithm that computes the sum of a $w^k\hbox{-}{\rm{bit}}$ binary sequence in the time of $2k-2$ broadcasts, while the corresponding prefix sums can be computed in the time of $3k-4$ broadcasts. Quite remarkably, in spite of the fact that our hardware-algorithm uses only linear arrays of size at most $w^2$, the total number of broadcasts involved is less than three times the number required by an 驴ideal驴 design. We then go on to propose a second hardware-algorithm, operating in pipelined fashion, that computes the sum of a $kw^k\hbox{-}{\rm{bit}}$ binary sequence in the time of $3k+\lceil\log_w k\rceil -3$ broadcasts. Using this design, the corresponding prefix sums can be computed in the time of $4k+\lceil\log_w k\rceil -5$ broadcasts.