Algorithms for Quad-Double Precision Floating Point Arithmetic
ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
Extended-precision floating-point numbers for GPU computation
ACM SIGGRAPH 2006 Research posters
Supporting extended precision on graphics processors
Proceedings of the Sixth International Workshop on Data Management on New Hardware
Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?
Hi-index | 0.00 |
We implemented the quadruple precision Basic Linear Algebra Subprograms (BLAS) functions, AXPY, GEMV and GEMM, on graphics processing units (GPUs), and evaluated their performance. We used DD-type quadruple precision operations, which combine two double precision values to represent a quadruple precision value. On an NVIDIA Tesla C1060, our BLAS functions are up to approximately 30 times faster than the existing quadruple precision BLAS on an Intel Core i7 920. Additionally, the execution time of quadruple precision AXPY takes only approximately 2.7 times longer than that of double precision AXPY on the Tesla C1060. We have shown that quadruple precision BLAS operations are suitable for GPUs.