Improving the Accuracy of High Performance BLAS Implementations Using Adaptive Blocked Algorithms

  • Authors:
  • Matthew Badin;Paolo D'Alberto;Lubmir Bic;Michael Dillencourt;Alexandru Nicolau

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • SBAC-PAD '11 Proceedings of the 2011 23rd International Symposium on Computer Architecture and High Performance Computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Matrix multiply is ubiquitous in scientific computing. Considerable effort has been spent on improving its performance. Once methods that make efficient use of the processor have been exhausted, methods that use less operations than the canonical matrix multiply must be explored. Combining the two methods yields a hybrid matrix multiply algorithm. Hybrid matrix multiply algorithms tend to be less accurate than the canonical matrix multiply implementation, leaving room for improvement. There are well-known techniques for improving accuracy, but they tend to be slow and it is not immediately obvious how best to apply them to hybrid algorithms without lowering performance. Previous attempts have focused on the bottom of the hybrid matrix multiply algorithm, modifying the high-performance matrix multiply implementation. In contrast, the top-down approach presented here does not require the modification of the high-performance matrix multiply implementation at the bottom, nor does it require modification of the fast asymptotic matrix multiply algorithm at the top. The three-level hybrid algorithm presented here not only has up to 10% better performance than the fastest high-performance matrix multiply, but is also more accurate.