Computer organization and design (2nd ed.): the hardware/software interface
Computer organization and design (2nd ed.): the hardware/software interface
High-Speed Booth Encoded Parallel Multiplier Design
IEEE Transactions on Computers - Special issue on computer arithmetic
VIS Speeds New Media Processing
IEEE Micro
Subword Parallelism with MAX-2
IEEE Micro
Exploiting data-width locality to increase superscalar execution bandwidth
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
An Efficient Twin-Precision Multiplier
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Architecture and Implementation of a Vector/SIMD Multiply-Accumulate Unit
IEEE Transactions on Computers
A Flexible Datapath Interconnect for Embedded Applications
ISVLSI '07 Proceedings of the IEEE Computer Society Annual Symposium on VLSI
A Two's Complement Parallel Array Multiplication Algorithm
IEEE Transactions on Computers
A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations
IEEE Transactions on Computers
IEEE Transactions on Circuits and Systems Part I: Regular Papers - Special section on 2009 IEEE system-on-chip conference
A new high radix-2r (r≥8) multibit recoding algorithm for large operand size (N≥32) multipliers
ACM SIGARCH Computer Architecture News
Hi-index | 0.00 |
We present the twin-precision technique for integer multipliers. The twin-precision technique can reduce the power dissipation by adapting a multiplier to the bitwidth of the operands being computed. The technique also enables an increased computational throughput, by allowing several narrow-width operations to be computed in parallel. We describe how to apply the twin-precision technique also to signed multiplier schemes, such as Baugh-Wooley and modified-Booth multipliers. It is shown that the twin-precision delay penalty is small (5%-10%) and that a significant reduction in power dissipation (40%-70%) can be achieved, when operating on narrow-width operands. In an application case study, we show that by extending the multiplier of a general-purpose processor with the twin-precision scheme, the execution time of a Fast Fourier Transform is reduced with 15% at a 14% reduction in datapath energy dissipation. All our evaluations are based on layout-extracted data from multipliers implemented in 130-nm and 65-nm commercial process technologies.