Improving numerical accuracy for non-negative matrix multiplication on GPUs using recursive algorithms

  • Authors:
  • Matthew Badin;Paolo D'Alberto;Lubomir Bic;Michael Dillencourt;Alexandru Nicolau

  • Affiliations:
  • University of California Irvine, Irvine, CA, USA;FastMMW, Sunnyvale, CA, USA;University of California Irvine, Irvine, CA, USA;University of California Irvine, Irvine, CA, USA;University of California Irvine, Irvine, CA, USA

  • Venue:
  • Proceedings of the 27th international ACM conference on International conference on supercomputing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scientific computing is only bound by the limits of Moore's Law and the scalability of high performance mathematical library implementations. Most mathematical libraries however tend to focus only on general inputs, limiting their potential performance and scalability by not tailoring their implementation to specific inputs, such as non-negative inputs. By removing this limitation it is possible to improve the performance and accuracy of a range of problems. In this paper we explore the limitations of hardware to improve accuracy of non-negative matrix multiply by specifically comparing implementations on the GPU and CPU and propose algorithmic solutions to improve accuracy. Next, we demonstrate a matrix multiply implementation that takes advantage of asymptotically fast matrix multiply algorithms, which have been shown to scale better than O(N3) matrix multiply implementations, and improve accuracy by up to a whole digit while increasing performance by up to 27% for matrices where the input is positive. Finally, we propose to extend the BLAS level 3 specification to non-negative matrices to allow easy integration of our solution and allow other library authors to implement their own solutions as part of an existing standard.