All-pairs computations on many-core graphics processors

  • Authors:
  • Abhinav Sarje;Srinivas Aluru

  • Affiliations:
  • Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA;Department of Electrical and Computer Engineering, Iowa State University, Ames, IA 50011, USA

  • Venue:
  • Parallel Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Developing high-performance applications on emerging multi- and many-core architectures requires efficient mapping techniques and architecture-specific tuning methodologies to realize performance closer to their peak compute capability and memory bandwidth. In this paper, we develop architecture-aware methods to accelerate all-pairs computations on many-core graphics processors. Pairwise computations occur frequently in numerous application areas in scientific computing. While they appear easy to parallelize due to the independence of computing each pairwise interaction from all others, development of techniques to address multi-layered memory hierarchies, mapping within the restrictions imposed by the small and low-latency on-chip memories, striking the right balanced between concurrency, reuse and memory traffic etc., are crucial to obtain high-performance. We present a hierarchical decomposition scheme for GPUs based on decomposition of the output matrix and input data. We demonstrate that a careful tuning of the involved set of decomposition parameters is essential to achieve high efficiency on the GPUs. We also compare the performance of our strategies with an implementation on the STI Cell processor as well as multi-core CPU parallelizations using OpenMP and Intel Threading Building Blocks.