Area-efficient architectures for double precision multiplier on FPGA, with run-time-reconfigurable dual single precision support

Authors:
Manish Kumar Jaiswal;Ray C. C. Cheung
Affiliations:
Department of Electronic Engineering, City University of Hong Kong, Hong Kong;Department of Electronic Engineering, City University of Hong Kong, Hong Kong
Venue:
Microelectronics Journal
Year:
2013

Citing 26
Cited 0

What every computer scientist should know about floating-point arithmetic

ACM Computing Surveys (CSUR)
A Library of Parameterized Floating-Point Modules and Their Use

FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Using Floating-Point Arithmetic on FPGAs to Accelerate Scientific N-Body Simulations

FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
A fast parallel reed-solomon decoder on a reconfigurable architecture

Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A quantitative analysis of the speedup factors of FPGAs over processors

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
FPGAs vs. CPUs: trends in peak floating-point performance

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Accelerating Scientific Applications with the SRC-6 Reconfigurable Computer: Methodologies and Analysis

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Reconfigurable computing with multiscale data fusion for remote sensing

Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
A Reconfigurable Architecture for Wireless Communication Systems

ITNG '06 Proceedings of the Third International Conference on Information Technology: New Generations
Open Source High Performance Floating-Point Modules

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Pipelined Mixed Precision Algorithms on FPGAs for Fast and Accurate PDE Solvers from Low Precision Components

FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Fast elliptic curve cryptography on FPGA

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
FPGA implementation of high performance elliptic curve cryptographic processor over GF(2163)

Journal of Systems Architecture: the EUROMICRO Journal
Accuracy and performance of graphics processors: A Quantum Monte Carlo application case study

Parallel Computing
The IBM system/360 model 91: floating-point execution unit

IBM Journal of Research and Development
An analysis of floating-point addition

IBM Systems Journal
Computer Arithmetic: Algorithms and Hardware Designs

Computer Arithmetic: Algorithms and Hardware Designs
Fast, Efficient Floating-Point Adders and Multipliers for FPGAs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
VFloat: A Variable Precision Fixed- and Floating-Point Library for Reconfigurable Hardware

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
BLAS Comparison on FPGA, CPU and GPU

ISVLSI '10 Proceedings of the 2010 IEEE Annual Symposium on VLSI
Multipliers for floating-point double precision and beyond on FPGAs

ACM SIGARCH Computer Architecture News
Novo-G: At the Forefront of Scalable Reconfigurable Supercomputing

Computing in Science and Engineering
Analytical Calculation of the Maximum Error for a Family of Truncated Multipliers Providing Minimum Mean Square Error

IEEE Transactions on Computers
FPGA-Based High-Performance and Scalable Block LU Decomposition Architecture

IEEE Transactions on Computers
Towards a Universal FPGA Matrix-Vector Multiplication Architecture

FCCM '12 Proceedings of the 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines
Area-Efficient Architectures for Large Integer and Quadruple Precision Floating Point Multipliers

FCCM '12 Proceedings of the 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines

Quantified Score

Hi-index	0.00

Visualization

Abstract

Floating point arithmetic (FPA) is a crucial basic building block in many application domains such as scientific, numerical and signal processing applications. Multiplication is one of the most commonly used one in FPA. This paper presents three architectures targeting Double Precision (D.P.) multiplier, with one being capable of performing run-time-reconfigurable (RTR) dual Single Precision (S.P.) multiplication operation. The first design is based on a novel block-level truncated multiplication, which is able to reduce 1/3 of multiplier blocks with high performance, and is within 1-ULP (unit in the last place) precision from IEEE-754 floating-point standard precision. The second design regains the accuracy lost from the first design, with the same amount of multiplier blocks but some extra hardware, is also able to achieve better performance with less latency than existing work. The third architecture in this paper is able to perform either, with the single double (extended) precision or dual single (extended) precision operands, without any pipeline stall, and with attractive area, speed and latency results. The first design is suitable for the applications with slightly less precision requirement, whereas the other two designs are fully compatible to the IEEE standard accuracy. Design-1 is able to achieve around 300MHz and 450MHz on Virtex-4 (V4) and Virtex-5 (V5), respectively, with only 6 DSP48, and latency of 9 cycles. Design-2 is capable of achieving about 325MHz (V4) and 400MHz (V5), with only 6 DSP48, with full precision support. The third design achieves more than 250MHz (V4) and 325MHz (V5) speed, providing on-the-fly dual precision support, with hardware requirement similar to only double precision supported implementations in the literature. Promising results are obtained by comparing the proposed designs with the best reported floating point multipliers in the literature.