Automatically tuning sparse matrix-vector multiplication for GPU architectures

  • Authors:
  • Alexander Monakov;Anton Lokhmotov;Arutyun Avetisyan

  • Affiliations:
  • Institute for System Programming of RAS, Moscow, Russian Federation;Department of Computing, Imperial College London, London, United Kingdom;Institute for System Programming of RAS, Moscow, Russian Federation

  • Venue:
  • HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Graphics processors are increasingly used in scientific applications due to their high computational power, which comes from hardware with multiple-level parallelism and memory hierarchy. Sparse matrix computations frequently arise in scientific applications, for example, when solving PDEs on unstructured grids. However, traditional sparse matrix algorithms are difficult to efficiently parallelize for GPUs due to irregular patterns of memory references. In this paper we present a new storage format for sparse matrices that better employs locality, has low memory footprint and enables automatic specialization for various matrices and future devices via parameter tuning. Experimental evaluation demonstrates significant speedups compared to previously published results.