Error-free transformations of matrix multiplication by using fast routines of matrix multiplication and its applications

  • Authors:
  • Katsuhisa Ozaki;Takeshi Ogita;Shin'Ichi Oishi;Siegfried M. Rump

  • Affiliations:
  • Department of Mathematical Sciences, Shibaura Institute of Technology, Saitama, Japan 337-8570 and Japan Science and Technology Agency (JST)/CREST, Tokyo, Japan;Japan Science and Technology Agency (JST)/CREST, Tokyo, Japan and Division of Mathematical Sciences, Tokyo Woman's Christian University, Tokyo, Japan 167-8585;Japan Science and Technology Agency (JST)/CREST, Tokyo, Japan and Faculty and Science and Engineering, Waseda University, Tokyo, Japan 169-0072;Faculty and Science and Engineering, Waseda University, Tokyo, Japan 169-0072 and Institute for Reliable Computing, Hamburg University of Technology, Hamburg, Germany 21071

  • Venue:
  • Numerical Algorithms
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper is concerned with accurate matrix multiplication in floating-point arithmetic. Recently, an accurate summation algorithm was developed by Rump et al. (SIAM J Sci Comput 31(1):189---224, 2008). The key technique of their method is a fast error-free splitting of floating-point numbers. Using this technique, we first develop an error-free transformation of a product of two floating-point matrices into a sum of floating-point matrices. Next, we partially apply this error-free transformation and develop an algorithm which aims to output an accurate approximation of the matrix product. In addition, an a priori error estimate is given. It is a characteristic of the proposed method that in terms of computation as well as in terms of memory consumption, the dominant part of our algorithm is constituted by ordinary floating-point matrix multiplications. The routine for matrix multiplication is highly optimized using BLAS, so that our algorithms show a good computational performance. Although our algorithms require a significant amount of working memory, they are significantly faster than `gemmx' in XBLAS when all sizes of matrices are large enough to realize nearly peak performance of `gemm'. Numerical examples illustrate the efficiency of the proposed method.