What every computer scientist should know about floating-point arithmetic
ACM Computing Surveys (CSUR)
A Library of Parameterized Floating-Point Modules and Their Use
FPL '02 Proceedings of the Reconfigurable Computing Is Going Mainstream, 12th International Conference on Field-Programmable Logic and Applications
Using Floating-Point Arithmetic on FPGAs to Accelerate Scientific N-Body Simulations
FCCM '02 Proceedings of the 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
A fast parallel reed-solomon decoder on a reconfigurable architecture
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A quantitative analysis of the speedup factors of FPGAs over processors
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
FPGAs vs. CPUs: trends in peak floating-point performance
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
Reconfigurable computing with multiscale data fusion for remote sensing
Proceedings of the 2006 ACM/SIGDA 14th international symposium on Field programmable gate arrays
A Reconfigurable Architecture for Wireless Communication Systems
ITNG '06 Proceedings of the Third International Conference on Information Technology: New Generations
Open Source High Performance Floating-Point Modules
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Fast elliptic curve cryptography on FPGA
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
FPGA implementation of high performance elliptic curve cryptographic processor over GF(2163)
Journal of Systems Architecture: the EUROMICRO Journal
The IBM system/360 model 91: floating-point execution unit
IBM Journal of Research and Development
An analysis of floating-point addition
IBM Systems Journal
Computer Arithmetic: Algorithms and Hardware Designs
Computer Arithmetic: Algorithms and Hardware Designs
Fast, Efficient Floating-Point Adders and Multipliers for FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
VFloat: A Variable Precision Fixed- and Floating-Point Library for Reconfigurable Hardware
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
BLAS Comparison on FPGA, CPU and GPU
ISVLSI '10 Proceedings of the 2010 IEEE Annual Symposium on VLSI
Multipliers for floating-point double precision and beyond on FPGAs
ACM SIGARCH Computer Architecture News
Novo-G: At the Forefront of Scalable Reconfigurable Supercomputing
Computing in Science and Engineering
IEEE Transactions on Computers
FPGA-Based High-Performance and Scalable Block LU Decomposition Architecture
IEEE Transactions on Computers
Towards a Universal FPGA Matrix-Vector Multiplication Architecture
FCCM '12 Proceedings of the 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines
Area-Efficient Architectures for Large Integer and Quadruple Precision Floating Point Multipliers
FCCM '12 Proceedings of the 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines
Hi-index | 0.00 |
Floating point arithmetic (FPA) is a crucial basic building block in many application domains such as scientific, numerical and signal processing applications. Multiplication is one of the most commonly used one in FPA. This paper presents three architectures targeting Double Precision (D.P.) multiplier, with one being capable of performing run-time-reconfigurable (RTR) dual Single Precision (S.P.) multiplication operation. The first design is based on a novel block-level truncated multiplication, which is able to reduce 1/3 of multiplier blocks with high performance, and is within 1-ULP (unit in the last place) precision from IEEE-754 floating-point standard precision. The second design regains the accuracy lost from the first design, with the same amount of multiplier blocks but some extra hardware, is also able to achieve better performance with less latency than existing work. The third architecture in this paper is able to perform either, with the single double (extended) precision or dual single (extended) precision operands, without any pipeline stall, and with attractive area, speed and latency results. The first design is suitable for the applications with slightly less precision requirement, whereas the other two designs are fully compatible to the IEEE standard accuracy. Design-1 is able to achieve around 300MHz and 450MHz on Virtex-4 (V4) and Virtex-5 (V5), respectively, with only 6 DSP48, and latency of 9 cycles. Design-2 is capable of achieving about 325MHz (V4) and 400MHz (V5), with only 6 DSP48, with full precision support. The third design achieves more than 250MHz (V4) and 325MHz (V5) speed, providing on-the-fly dual precision support, with hardware requirement similar to only double precision supported implementations in the literature. Promising results are obtained by comparing the proposed designs with the best reported floating point multipliers in the literature.