A high-speed, energy-efficient two-cycle multiply-accumulate (MAC) architecture and Its application to a double-throughput MAC unit

Authors:
Tung Thanh Hoang;Magnus Själander;Per Larsson-Edefors
Affiliations:
Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden;Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden;Department of Computer Science and Engineering, Chalmers University of Technology, Gothenburg, Sweden
Venue:
IEEE Transactions on Circuits and Systems Part I: Regular Papers - Special section on 2009 IEEE system-on-chip conference
Year:
2010

Citing 12
Cited 1

High-Speed Booth Encoded Parallel Multiplier Design

IEEE Transactions on Computers - Special issue on computer arithmetic
A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach

IEEE Transactions on Computers
Implementing Multiply-Accumulate Operation in Multiplication Time

ARITH '97 Proceedings of the 13th Symposium on Computer Arithmetic (ARITH '97)
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
An Algorithmic Approach for Generic Parallel Adders

Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
An Efficient Twin-Precision Multiplier

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
A Two's Complement Parallel Array Multiplication Algorithm

IEEE Transactions on Computers
FlexCore: Utilizing Exposed Datapath Control for Efficient Computing

Journal of Signal Processing Systems
Double Throughput Multiply-Accumulate unit for FlexCore processor enhancements

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations

IEEE Transactions on Computers
Multiplication acceleration through twin precision

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Design of power-efficient configurable booth multiplier

IEEE Transactions on Circuits and Systems Part I: Regular Papers

Low power energy efficient pipelined multiply-accumulate architecture

Proceedings of the International Conference on Advances in Computing, Communications and Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a high-speed and energy-efficient two-cycle multiply-accumulate (MAC) architecture that supports two's complement numbers, and includes accumulation guard bits and saturation circuitry. The first MAC pipeline stage contains only partial-product generation circuitry and a reduction tree, while the second stage, thanks to a special sign-extension solution, implements all other functionality. Place-and-route evaluations using a 65-nm 1.1-V cell library show that the proposed architecture offers a 31% improvement in speed and a 32% reduction in energy per operation, averaged across operand sizes of 16, 32, 48, and 64 bits, over a reference two-cycle MAC architecture that employs a multiplier in the first stage and an accumulator in the second. When operating the proposed architecture at the lower frequency of the reference architecture the available timing slack can be used to downsize gates, resulting in a 52% reduction in energy compared to the reference. We extend the new architecture to create a versatile double-throughput MAC (DTMAC) unit that efficiently performs either multiply-accumulate or multiply operations for N-bit, 1 × N/2-bit, or 2 × N/2-bit operands. In comparison to a fixed-function 32-bit MAC unit, 16-bit multiply-accumulate operations can be executed with 67% higher energy efficiency on a 32-bit DTMAC unit.