Multiplication acceleration through twin precision

Authors:
Magnus Själander;Per Larsson-Edefors
Affiliations:
Department of Computer Science and Engineering, Chalmers University of Technology, Göteborg, Sweden;Department of Computer Science and Engineering, Chalmers University of Technology, Göteborg, Sweden
Venue:
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Year:
2009

Citing 17
Cited 2

Computer organization and design (2nd ed.): the hardware/software interface

Computer organization and design (2nd ed.): the hardware/software interface
High-Speed Booth Encoded Parallel Multiplier Design

IEEE Transactions on Computers - Special issue on computer arithmetic
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
Accelerating Multimedia with Enhanced Microprocessors

IEEE Micro
VIS Speeds New Media Processing

IEEE Micro
Subword Parallelism with MAX-2

IEEE Micro
AltiVec Extension to PowerPC Accelerates Media Processing

IEEE Micro
Implementing Streaming SIMD Extensions on the Pentium III Processor

IEEE Micro
A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach

IEEE Transactions on Computers
Exploiting data-width locality to increase superscalar execution bandwidth

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
An Efficient Twin-Precision Multiplier

ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Architecture and Implementation of a Vector/SIMD Multiply-Accumulate Unit

IEEE Transactions on Computers
Parallelism and the ARM Instruction Set Architecture

Computer
A Flexible Datapath Interconnect for Embedded Applications

ISVLSI '07 Proceedings of the IEEE Computer Society Annual Symposium on VLSI
A Two's Complement Parallel Array Multiplication Algorithm

IEEE Transactions on Computers
A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations

IEEE Transactions on Computers

A high-speed, energy-efficient two-cycle multiply-accumulate (MAC) architecture and Its application to a double-throughput MAC unit

IEEE Transactions on Circuits and Systems Part I: Regular Papers - Special section on 2009 IEEE system-on-chip conference
A new high radix-2r (r≥8) multibit recoding algorithm for large operand size (N≥32) multipliers

ACM SIGARCH Computer Architecture News

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present the twin-precision technique for integer multipliers. The twin-precision technique can reduce the power dissipation by adapting a multiplier to the bitwidth of the operands being computed. The technique also enables an increased computational throughput, by allowing several narrow-width operations to be computed in parallel. We describe how to apply the twin-precision technique also to signed multiplier schemes, such as Baugh-Wooley and modified-Booth multipliers. It is shown that the twin-precision delay penalty is small (5%-10%) and that a significant reduction in power dissipation (40%-70%) can be achieved, when operating on narrow-width operands. In an application case study, we show that by extending the multiplier of a general-purpose processor with the twin-precision scheme, the execution time of a Fast Fourier Transform is reduced with 15% at a 14% reduction in datapath energy dissipation. All our evaluations are based on layout-extracted data from multipliers implemented in 130-nm and 65-nm commercial process technologies.