High-Performance Reduction Circuits Using Deeply Pipelined Operators on FPGAs
IEEE Transactions on Parallel and Distributed Systems
Visions for application development on hybrid computing systems
Parallel Computing
OpenFPGA CoreLib core library interoperability effort
Parallel Computing
FPGA-based, floating-point reduction operations
MATH'06 Proceedings of the 10th WSEAS International Conference on APPLIED MATHEMATICS
Journal of Parallel and Distributed Computing
RAT: RC Amenability Test for Rapid Performance Prediction
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Parallel backprojection: a case study in high-performance reconfigurable computing
EURASIP Journal on Embedded Systems - FPGA supercomputing platforms, architectures, and techniques for accelerating computationally complex algorithms
A truly two-dimensional systolic array FPGA implementation of QR decomposition
ACM Transactions on Embedded Computing Systems (TECS)
VFloat: A Variable Precision Fixed- and Floating-Point Library for Reconfigurable Hardware
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Journal of Signal Processing Systems
High performance reconfigurable architecture for double precision floating point division
ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
International Journal of Reconfigurable Computing - Special issue on High-Performance Reconfigurable Computing
Floating-Point Exponentiation Units for Reconfigurable Computing
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Hi-index | 0.00 |
Optimal reconfigurable hardware implementations may require the use of arbitrary floating-point formats that do not necessarily conform to IEEE specified sizes. We have previously presented a variable precision floating-point library for use with reconfigurable hardware. We recently added three advanced components: floating-point division, floating-point square root and floating-point accumulation to our library. These advanced components use algorithms that are well suited to FPGA implementations and exhibit a good tradeoff between area, latency and throughput. The floating-point format of our library is both general and flexible. All IEEE formats, including 64-bit double-precision format, are a subset of our format. All previously published floating-point formats for reconfigurable hardware are a subset of our format as well. The generic floating-point format supported by all of our library components makes it easy and convenient to create a pipelined, custom datapath with optimal bitwidth for each operation. Our library can be used to achieve more parallelism and less power dissipation than adhering to a standard format. To further increase parallelism and reduce power dissipation, our library also supports hybrid fixed and floatingpoint operations in the same design. The division and square root designs are based on table lookup and Taylor series expansion, and make use of memories and multipliers embedded on the FPGA chip. The iterative accumulator utilizes the library addition module as well as buffering and control logic to achieve performance similar to that of the addition by itself. They are all fully pipelined designs with clock speed comparable to that of other library components to aid the designer in implementing fast, complex, pipelined designs.