Self-Alignment Schemes for the Implementation of Addition-Related Floating-Point Operators

Authors:
Tarek Ould-Bachir;Jean Pierre David
Affiliations:
École Polytechnique de Montréal;École Polytechnique de Montréal
Venue:
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Year:
2013

Citing 32
Cited 0

Generalized Signed-Digit Number Systems: A Unifying Framework for Redundant Number Representations

IEEE Transactions on Computers
The accuracy of floating point summation

SIAM Journal on Scientific Computing
Accelerating Pipelined Integer and Floating-Point Accumulations in Configurable Hardware with Delayed Addition Techniques

IEEE Transactions on Computers
Computer Arithmetic Algorithms

Computer Arithmetic Algorithms
A New VLSI Vector Arithmetic Coprocessor for the PC

ARITH '95 Proceedings of the 12th Symposium on Computer Arithmetic
The Case for a Redundant Format in Floating Point Arithmetic

ARITH '03 Proceedings of the 16th IEEE Symposium on Computer Arithmetic (ARITH-16'03)
On the design of high performance digital arithmetic units

On the design of high performance digital arithmetic units
Advanced Arithmetic for the Digital Computer: Design of Arithmetic Units

Advanced Arithmetic for the Digital Computer: Design of Arithmetic Units
An FPGA-Based Floating-Point Jacobi Iterative Solver

ISPAN '05 Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks
Optimistic Parallelization of Floating-Point Accumulation

ARITH '07 Proceedings of the 18th IEEE Symposium on Computer Arithmetic
High-Performance Designs for Linear Algebra Operations on Reconfigurable Hardware

IEEE Transactions on Computers
Accurate Floating-Point Summation Part I: Faithful Rounding

SIAM Journal on Scientific Computing
Automatic Generation of Modular Multipliers for FPGA Applications

IEEE Transactions on Computers
A Floating-Point Unit for 4D Vector Inner Product with Reduced Latency

IEEE Transactions on Computers
Multi-operand Floating-Point Addition

ARITH '09 Proceedings of the 2009 19th IEEE Symposium on Computer Arithmetic
FPGA Floating Point Datapath Compiler

FCCM '09 Proceedings of the 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines
Ultimately Fast Accurate Summation

SIAM Journal on Scientific Computing
A Hardware Accelerator for the Fast Retrieval of DIALIGN Biological Sequence Alignments in Linear Space

IEEE Transactions on Computers
Redundant-Digit Floating-Point Addition Scheme Based on a Stored Rounding Value

IEEE Transactions on Computers
Handbook of Floating-Point Arithmetic

Handbook of Floating-Point Arithmetic
Performing Floating-Point Accumulation on a Modern FPGA in Single and Double Precision

FCCM '10 Proceedings of the 2010 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines
Fast, Efficient Floating-Point Adders and Multipliers for FPGAs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
VFloat: A Variable Precision Fixed- and Floating-Point Library for Reconfigurable Hardware

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Characterization of Fixed and Reconfigurable Multi-Core Devices for Application Acceleration

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Molecular Dynamics Simulations on High-Performance Reconfigurable Computing Systems

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Pipelined FPGA Adders

FPL '10 Proceedings of the 2010 International Conference on Field Programmable Logic and Applications
Synthesis of Floating-Point Addition Clusters on FPGAs Using Carry-Save Arithmetic

FPL '10 Proceedings of the 2010 International Conference on Field Programmable Logic and Applications
Accelerating Machine-Learning Algorithms on FPGAs using Pattern-Based Decomposition

Journal of Signal Processing Systems
Designing Custom Arithmetic Data Paths with FloPoCo

IEEE Design & Test
FPGA-Based High-Performance and Scalable Block LU Decomposition Architecture

IEEE Transactions on Computers
FFT Implementation with Fused Floating-Point Operations

IEEE Transactions on Computers
Accelerating Matrix Operations with Improved Deeply Pipelined Vector Reduction

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Advances in semiconductor technology brings to the market incredibly dense devices, capable of handling tens to hundreds floating-point operators on a single chip; so do the latest field programmable gate arrays (FPGAs). In order to alleviate the complexity of resorting to these devices in computationally intensive applications, this article proposes hardware schemes for the realization of addition-related floating-point operators based on the self-alignment technique (SAT). The article demonstrates that the schemes guarantee an accuracy as if summation was computed accurately in the precision of operator’s internal mantissa, then faithfully rounded to working precision. To achieve such performance, the article adopts the redundant high radix carry-save (HRCS) format for the rapid addition of wide mantissas. Implementation results show that combining the SAT and the HRCS format allows the implementation of complex operators with reduced area and latency, more so when a fused-path approach is adopted. The article also proposes a new hardware operator for performing endomorphic HRCS additions and presents a new technique for speeding up the conversion from the redundant HRCS to a conventional binary format.