NULL Convention multiply and accumulate unit with conditional rounding, scaling, and saturation

Authors:
S. C. Smith;R. F. DeMara;J. S. Yuan;M. Hagedorn;D. Ferguson
Affiliations:
Department of Electrical and Computer Engineering, University of Missouri - Rolla, 123 Emerson Electric Co. Hall, 1870 Miner Circle, Rolla, MO;School of Electrical Engineering and Computer Science, University of Central Florida, Box 162450, Orlando, FL;School of Electrical Engineering and Computer Science, University of Central Florida, Box 162450, Orlando, FL;Theseus Logic Inc., 485 North Keller Road, Suite 140, Maitland, FL;Theseus Logic Inc., 485 North Keller Road, Suite 140, Maitland, FL
Venue:
Journal of Systems Architecture: the EUROMICRO Journal
Year:
2002

Citing 12
Cited 5

Introduction to algorithms

Introduction to algorithms
The limitations to delay-insensitivity in asynchronous circuits

AUSCRYPT '90 Proceedings of the sixth MIT conference on Advanced research in VLSI
Programming in VLSI: from communicating processes to delay-insensitive circuits

Developments in concurrency and communication
An Efficient Implementation of Boolean Functions as Self-Timed Circuits

IEEE Transactions on Computers
Beware the isochronic fork

Integration, the VLSI Journal
Self-timed rings and their application to division

Self-timed rings and their application to division
Design of delay insensitive circuits using multi-ring structures

EURO-DAC '92 Proceedings of the conference on European design automation
Computer arithmetic: algorithms and hardware designs

Computer arithmetic: algorithms and hardware designs
Design of self-timed asynchronous Booth's multiplier

ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
NULL Convention Logic/sup TM/: A Complete And Consistent Logic For Asynchronous Digital Circuit Synthesis

ASAP '96 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
A DESIGN METHODOLOGY FOR SELF-TIME SYSTEMS

A DESIGN METHODOLOGY FOR SELF-TIME SYSTEMS
Gate and throughput optimizations for null convention self-timed digital circuits

Gate and throughput optimizations for null convention self-timed digital circuits

Optimization of NULL convention self-timed circuits

Integration, the VLSI Journal
Development of a large word-width high-speed asynchronous multiply and accumulate unit

Integration, the VLSI Journal
Design and characterization of NULL convention arithmetic logic units

Microelectronic Engineering
Development of a large word-width high-speed asynchronous multiply and accumulate unit

Integration, the VLSI Journal
Signed multiplication technique by means of unsigned multiply instruction

Computers and Electrical Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Approaches for maximizing throughput of self-timed multiply-accumulate units (MACs) are developed and assessed using the NULL convention logic paradigm. In this class of self-timed circuits, the functional correctness is independent of any delays in circuit elements, through circuit construction, and independent of any wire delays, through the isochronic fork assumption [1,2], where wire delays are assumed to be much less than gate delays. Therefore self-timed circuits provide distinct advantages for System-on-a-Chip applications.First, a number of alternative MAC algorithms are compared and contrasted in terms of throughput and area to determine which approach will yield the maximum throughput with the least area. It was determined that two algorithms that meet these criteria well are the Modified Baugh-Wooley and Modified Booth2 algorithms. Dual-rail non-pipelined versions of these algorithms were first designed using the threshold combinational reduction method [3]. The non-pipelined designs were then optimized for throughput using the gate-level pipelining method [4]. Finally, each design was simulated using Synopsys to quantify the advantage of the dual-rail pipelined Modified Baugh-Wooley MAC, which yielded a speedup of 2.5 over its initial non-pipelined version. This design also required 20% fewer gates than the dual-rail pipelined Modified Booth2 MAC that had the same throughput. The resulting design employs a three-stage feed-forward multiply pipeline connected to a four-stage feedback multifunctional loop to perform a 72 + 32 × 32 MAC in 12.7 ns on average using a 0.25 µm CMOS process at 3.3 V, thus outperforming other delay-insensitive/self-timed MACs in the literature.