Implementation and evaluation of quadruple precision BLAS functions on GPUs

Authors:
Daichi Mukunoki;Daisuke Takahashi
Affiliations:
Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Ibaraki, Japan;Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Ibaraki, Japan
Venue:
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I
Year:
2010

Citing 3
Cited 1

Algorithms for Quad-Double Precision Floating Point Arithmetic

ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
Extended-precision floating-point numbers for GPU computation

ACM SIGGRAPH 2006 Research posters
Supporting extended precision on graphics processors

Proceedings of the Sixth International Workshop on Data Management on New Hardware

Performance comparison of double, triple and quadruple precision real and complex BLAS subroutines on GPUs

Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?

Quantified Score

Hi-index	0.00

Visualization

Abstract

We implemented the quadruple precision Basic Linear Algebra Subprograms (BLAS) functions, AXPY, GEMV and GEMM, on graphics processing units (GPUs), and evaluated their performance. We used DD-type quadruple precision operations, which combine two double precision values to represent a quadruple precision value. On an NVIDIA Tesla C1060, our BLAS functions are up to approximately 30 times faster than the existing quadruple precision BLAS on an Intel Core i7 920. Additionally, the execution time of quadruple precision AXPY takes only approximately 2.7 times longer than that of double precision AXPY on the Tesla C1060. We have shown that quadruple precision BLAS operations are suitable for GPUs.