Nested data-parallelism on the gpu

  • Authors:
  • Lars Bergstrom;John Reppy

  • Affiliations:
  • University of Chicago, Chicago, IL, USA;University of Chicago, Chicago, IL, USA

  • Venue:
  • Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Graphics processing units (GPUs) provide both memory bandwidth and arithmetic performance far greater than that available on CPUs but, because of their Single-Instruction-Multiple-Data (SIMD) architecture, they are hard to program. Most of the programs ported to GPUs thus far use traditional data-level parallelism, performing only operations that operate uniformly over vectors. NESL is a first-order functional language that was designed to allow programmers to write irregular-parallel programs - such as parallel divide-and-conquer algorithms - for wide-vector parallel computers. This paper presents our port of the NESL implementation to work on GPUs and provides empirical evidence that nested data-parallelism (NDP) on GPUs significantly outperforms CPU-based implementations and matches or beats newer GPU languages that support only flat parallelism. While our performance does not match that of hand-tuned CUDA programs, we argue that the notational conciseness of NESL is worth the loss in performance. This work provides the first language implementation that directly supports NDP on a GPU.