Special-purposed VLIW architecture for IEEE-754 quadruple precision elementary functions on FPGA

Authors:
Yuanwu Lei;Yong Dou;Li Shen;Jie Zhou;Song Guo
Affiliations:
National Laboratory for Parallel&Distributed Processing, National University of Defense Technology, Changsha, China 410073;National Laboratory for Parallel&Distributed Processing, National University of Defense Technology, Changsha, China 410073;National Laboratory for Parallel&Distributed Processing, National University of Defense Technology, Changsha, China 410073;National Laboratory for Parallel&Distributed Processing, National University of Defense Technology, Changsha, China 410073;National Laboratory for Parallel&Distributed Processing, National University of Defense Technology, Changsha, China 410073
Venue:
ICCD '11 Proceedings of the 2011 IEEE 29th International Conference on Computer Design
Year:
2011

Citing 0
Cited 1

VLIW coprocessor for IEEE-754 quadruple-precision elementary functions

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work explores the feasibility to implement IEEE-754-2008 standard quadruple precision (Quad) elementary functions on recent FPGAs with plenty of embedded memories and DSP blocks. First, we analysis the implementation algorithm of Quad elementary functions in detail. Then, we present a special-purpose Very Large Instruction Word (VLIW) architecture for Quad elementary function (QE-Processor). The proposed processor uses a unified hardware structure, equipped with multiple basic arithmetic units, to implement various Quad algebraic and transcendental functions, in which several tradeoffs between latency and resource usage are carefully planned to avoid unbalanced resource utilization. The performance is improved through the explicitly parallel technology of custom VLIW instruction. Finally, we create a prototype of QE-Processor into Xilinx Virtex-5 and Virtex-6 FPGA chips. The experimental results show that our design can guarantee that the percentage of correct rounding is more than 99.9%. Moreover, the FPGA implementation on Virtex-6 XC6VLX760-2FF1760 FPGA, running at 220 MHz, outperforms the parallel software approach based on OpenMP running on an Intel Xeon E5620 CPU at 2.40GHz by a factor of 13X-20X for special function applications in Boost library.