Singe: leveraging warp specialization for high performance on GPUs

  • Authors:
  • Michael Bauer;Sean Treichler;Alex Aiken

  • Affiliations:
  • Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA

  • Venue:
  • Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present Singe, a Domain Specific Language (DSL) compiler for combustion chemistry that leverages warp specialization to produce high performance code for GPUs. Instead of relying on traditional GPU programming models that emphasize data-parallel computations, warp specialization allows compilers like Singe to partition computations into sub-computations which are then assigned to different warps within a thread block. Fine-grain synchronization between warps is performed efficiently in hardware using producer-consumer named barriers. Partitioning computations using warp specialization allows Singe to deal efficiently with the irregularity in both data access patterns and computation. Furthermore, warp-specialized partitioning of computations allows Singe to fit extremely large working sets into on-chip memories. Finally, we describe the architecture and general compilation techniques necessary for constructing a warp-specializing compiler. We show that the warp-specialized code emitted by Singe is up to 3.75X faster than previously optimized data-parallel GPU kernels.