VLIW coprocessor for IEEE-754 quadruple-precision elementary functions

Authors:
Yuanwu Lei;Yong Dou;Lei Guo;Jinbo Xu;Jie Zhou;Yazhuo Dong;Hongjian Li
Affiliations:
National University of Defense Technology, Changsha, China;National University of Defense Technology, Changsha, China;National University of Defense Technology, Changsha, China;National University of Defense Technology, Changsha, China;National University of Defense Technology, Changsha, China;People's Liberation Army, Beijing, China;Logistics Scientific Institute, Beijing, China
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2008

Citing 22
Cited 0

Toward Correctly Rounded Transcendentals

IEEE Transactions on Computers
Should the Elementary Function Library Be Incorporated Into Computer Instruction Sets?

ACM Transactions on Mathematical Software (TOMS)
Introduction to algorithms

Introduction to algorithms
The S/390 G5 Floating Point Unit Supporting Hex and Binary Architectures

ARITH '99 Proceedings of the 14th IEEE Symposium on Computer Arithmetic
Very Long Instruction Word architectures and the ELI-512

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Worst Cases for Correct Rounding of the Elementary Functions in Double Precision

ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
64-bit floating-point FPGA matrix multiplication

Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
High-Precision Floating-Point Arithmetic in Scientific Computation

Computing in Science and Engineering
Dual-mode floating-point multiplier architectures with parallel operations

Journal of Systems Architecture: the EUROMICRO Journal
Elementary Functions: Algorithms and Implementation

Elementary Functions: Algorithms and Implementation
MPFR: A multiple-precision binary floating-point library with correct rounding

ACM Transactions on Mathematical Software (TOMS)
Return of the hardware floating-point elementary function

ARITH '07 Proceedings of the 18th IEEE Symposium on Computer Arithmetic
Multi-functional floating-point MAF designs with dot product support

Microelectronics Journal
Double Precision Hybrid-Mode Floating-Point FPGA CORDIC Co-processor

HPCC '08 Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications
Dual-mode floating-point adder architectures

Journal of Systems Architecture: the EUROMICRO Journal
Radix-16 Evaluation of Certain Elementary Functions

IEEE Transactions on Computers
State-of-the-art in heterogeneous computing

Scientific Programming
FPGA accelerating double/quad-double high precision floating-point applications for ExaScale computing

Proceedings of the 24th ACM International Conference on Supercomputing
Modern Computer Arithmetic

Modern Computer Arithmetic
Special-purposed VLIW architecture for IEEE-754 quadruple precision elementary functions on FPGA

ICCD '11 Proceedings of the 2011 IEEE 29th International Conference on Computer Design
Minimalist open-page: a DRAM page-mode scheduling policy for the many-core era

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD Support

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, a unified VLIW coprocessor, based on a common group of atomic operation units, for Quad arithmetic and elementary functions (QP_VELP) is presented. The explicitly parallel scheme of VLIW instruction and Estrin's evaluation scheme for polynomials are used to improve the performance. A two-level VLIW instruction RAM scheme is introduced to achieve high scalability and customizability, even for more complex key program kernels. Finally, the Quad arithmetic accelerator (QAA) with the QP_VELP array is implemented on ASIC. Compared with hyper-thread software implementation on an Intel Xeon E5620, QAA with 8 QP_VELP units achieves improvement by a factor of 18X.