Static GPU threads and an improved scan algorithm

  • Authors:
  • Jens Breitbart

  • Affiliations:
  • Research Group Programming Languages / Methodologies, Universität Kassel, Kassel, Germany

  • Venue:
  • Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Current GPU programming systems automatically distribute the work on all GPU processors based on a set of fixed assumptions, e. g. that all tasks are independent from each other. We show that automatic distribution limits algorithmic design, and demonstrate that manual work distribution hardly adds any overhead. Our Scan+algorithm is an improved scan relying on manual work distribution. It uses global barriers and task interleaving to provides almost twice the performance of Apple's reference implementation.