Toward Correctly Rounded Transcendentals
IEEE Transactions on Computers
Should the Elementary Function Library Be Incorporated Into Computer Instruction Sets?
ACM Transactions on Mathematical Software (TOMS)
Introduction to algorithms
The S/390 G5 Floating Point Unit Supporting Hex and Binary Architectures
ARITH '99 Proceedings of the 14th IEEE Symposium on Computer Arithmetic
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Worst Cases for Correct Rounding of the Elementary Functions in Double Precision
ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
64-bit floating-point FPGA matrix multiplication
Proceedings of the 2005 ACM/SIGDA 13th international symposium on Field-programmable gate arrays
High-Precision Floating-Point Arithmetic in Scientific Computation
Computing in Science and Engineering
Dual-mode floating-point multiplier architectures with parallel operations
Journal of Systems Architecture: the EUROMICRO Journal
Elementary Functions: Algorithms and Implementation
Elementary Functions: Algorithms and Implementation
MPFR: A multiple-precision binary floating-point library with correct rounding
ACM Transactions on Mathematical Software (TOMS)
Return of the hardware floating-point elementary function
ARITH '07 Proceedings of the 18th IEEE Symposium on Computer Arithmetic
Multi-functional floating-point MAF designs with dot product support
Microelectronics Journal
Double Precision Hybrid-Mode Floating-Point FPGA CORDIC Co-processor
HPCC '08 Proceedings of the 2008 10th IEEE International Conference on High Performance Computing and Communications
Dual-mode floating-point adder architectures
Journal of Systems Architecture: the EUROMICRO Journal
Radix-16 Evaluation of Certain Elementary Functions
IEEE Transactions on Computers
State-of-the-art in heterogeneous computing
Scientific Programming
Proceedings of the 24th ACM International Conference on Supercomputing
Modern Computer Arithmetic
Special-purposed VLIW architecture for IEEE-754 quadruple precision elementary functions on FPGA
ICCD '11 Proceedings of the 2011 IEEE 29th International Conference on Computer Design
Minimalist open-page: a DRAM page-mode scheduling policy for the many-core era
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD Support
IEEE Transactions on Computers
Hi-index | 0.00 |
In this article, a unified VLIW coprocessor, based on a common group of atomic operation units, for Quad arithmetic and elementary functions (QP_VELP) is presented. The explicitly parallel scheme of VLIW instruction and Estrin's evaluation scheme for polynomials are used to improve the performance. A two-level VLIW instruction RAM scheme is introduced to achieve high scalability and customizability, even for more complex key program kernels. Finally, the Quad arithmetic accelerator (QAA) with the QP_VELP array is implemented on ASIC. Compared with hyper-thread software implementation on an Intel Xeon E5620, QAA with 8 QP_VELP units achieves improvement by a factor of 18X.