Implementation and evaluation of quadruple precision BLAS functions on GPUs

  • Authors:
  • Daichi Mukunoki;Daisuke Takahashi

  • Affiliations:
  • Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Ibaraki, Japan;Graduate School of Systems and Information Engineering, University of Tsukuba, Tsukuba, Ibaraki, Japan

  • Venue:
  • PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We implemented the quadruple precision Basic Linear Algebra Subprograms (BLAS) functions, AXPY, GEMV and GEMM, on graphics processing units (GPUs), and evaluated their performance. We used DD-type quadruple precision operations, which combine two double precision values to represent a quadruple precision value. On an NVIDIA Tesla C1060, our BLAS functions are up to approximately 30 times faster than the existing quadruple precision BLAS on an Intel Core i7 920. Additionally, the execution time of quadruple precision AXPY takes only approximately 2.7 times longer than that of double precision AXPY on the Tesla C1060. We have shown that quadruple precision BLAS operations are suitable for GPUs.