Implementing Multiply-Accumulate Operation in Multiplication Time

Authors:
Paul F. Stelling;Vojin G. Oklobdzija
Affiliations:
-;-
Venue:
ARITH '97 Proceedings of the 13th Symposium on Computer Arithmetic (ARITH '97)
Year:
1997

Citing 0
Cited 6

Optimal Circuits for Parallel Multipliers

IEEE Transactions on Computers
Tight integration of timing-driven synthesis and placement of parallel multiplier circuits

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A timing-driven hybrid-compression algorithm for faster Sum-of-Products

CSS '07 Proceedings of the Fifth IASTED International Conference on Circuits, Signals and Systems
A high-speed, energy-efficient two-cycle multiply-accumulate (MAC) architecture and Its application to a double-throughput MAC unit

IEEE Transactions on Circuits and Systems Part I: Regular Papers - Special section on 2009 IEEE system-on-chip conference
Synthesis of Adaptable Hybrid Adders for Area Optimization under Timing Constraint

ACM Transactions on Design Automation of Electronic Systems (TODAES)
A just-in-time customizable processor

Proceedings of the International Conference on Computer-Aided Design

Quantified Score

Hi-index	0.01

Visualization

Abstract

Multiply-accumulate is an important and expensive operation. It is frequently used in digital signal processing and video/graphics applications. As a result, any improvement in the delay for performing this operation could have a positive impact on clock speed, instruction time and processor performance. In this paper, we show how, by extending our view of a parallel multiplier, we can apply recent innovations in parallel multiplier design to multiply-accumulators. This application results in multiply-accumulators that are as fast as multipliers of the same size (these multipliers have been shown to result in provably optimal delays faster than current designs). This allows a single (optimal multiply-accumulate) circuit to be used for both operations without delay penalty. As a result, multiply-accumulate can be efficiently and effectively implemented as an instruction in RISC CPUs. Additionally, the circuit design reduces the number of devices needed over current fast multiplier designs, so that real estate and power savings also result.