Design, implementation and testing of extended and mixed precision BLAS
ACM Transactions on Mathematical Software (TOMS)
Decimal Floating-Point: Algorism for Computers
ARITH '03 Proceedings of the 16th IEEE Symposium on Computer Arithmetic (ARITH-16'03)
Advanced Arithmetic for the Digital Computer: Design of Arithmetic Units
Advanced Arithmetic for the Digital Computer: Design of Arithmetic Units
Decimal Multiplication with Efficient Partial Product Generation
ARITH '05 Proceedings of the 17th IEEE Symposium on Computer Arithmetic
Decimal floating-point in z9: an implementation and testing perspective
IBM Journal of Research and Development
Decimal Floating-Point Multiplication Via Carry-Save Addition
ARITH '07 Proceedings of the 18th IEEE Symposium on Computer Arithmetic
A New Family of High.Performance Parallel Decimal Multipliers
ARITH '07 Proceedings of the 18th IEEE Symposium on Computer Arithmetic
IBM POWER6 accelerators: VMX and DFU
IBM Journal of Research and Development
Decimal Floating-Point Multiplication
IEEE Transactions on Computers
Improving the Speed of Parallel Decimal Multiplication
IEEE Transactions on Computers
RECONFIG '09 Proceedings of the 2009 International Conference on Reconfigurable Computing and FPGAs
FPGA Implementations of BCD Multipliers
RECONFIG '09 Proceedings of the 2009 International Conference on Reconfigurable Computing and FPGAs
Decimal Adders/Subtractors in FPGA: Efficient 6-input LUT Implementations
RECONFIG '09 Proceedings of the 2009 International Conference on Reconfigurable Computing and FPGAs
Improved Design of High-Performance Parallel Decimal Multipliers
IEEE Transactions on Computers
Hi-index | 0.00 |
Decimal Floating Point operations are important for applications that cannot tolerate errors from conversions between binary and decimal formats, for instance, commercial, financial, and insurance applications. In this paper, we present a parallel decimal fixed-point multiplier designed to exploit the features of Virtex-5 FPGAs. Our multiplier is based on BCD recoding schemes, fast partial product generation, and a BCD-4221 carry save adder reduction tree. Pipeline stages can be added to target low latency. Furthermore, we extend the multiplier with an accurate scalar product unit for IEEE 754-2008 decimal64 data format in order to provide an important operation with least possible rounding error. Compared to a previously published work, in this paper, we improve the architecture of the accurate scalar product unit and migrate to Virtex-5 FPGAs. This decreases the fixed-point multiplier's latency by a factor of two and the accurate scalar product unit's latency even by a factor of five.