ViperVM: a runtime system for parallel functional high-performance computing on heterogeneous architectures

  • Authors:
  • Sylvain Henry

  • Affiliations:
  • University of Bordeaux, Bordeaux, France

  • Venue:
  • Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The current trend in high-performance computing is to use heterogeneous architectures (i.e. multi-core with accelerators such as GPUs or Xeon Phi) because they offer very good performance over energy consumption ratios. Programming these architectures is notoriously hard, hence their use is still somewhat restricted to parallel programming experts. The situation is improving with frameworks using high-level programming models to generate efficient computation kernels for these new accelerator architectures. However, an orthogonal issue is to efficiently manage memory and kernel scheduling especially on architectures containing multiple accelerators. Task graph based runtime systems have been a first step toward efficiently automatizing these tasks. However they introduce new challenges of their own such as task granularity adaptation that cannot be easily automatized. In this paper, we present a programming model and a preliminary implementation of a runtime system called ViperVM that takes advantage of parallel functional programming to extend task graph based runtime systems. The main idea is to substitute dynamically created task graphs with pure functional programs that are evaluated in parallel by the runtime system. Programmers can associate kernels (written in OpenCL, CUDA, Fortran...) to identifiers that can then be used as pure functions in programs. During parallel evaluation, the runtime system automatically schedules kernels on available accelerators when it has to reduce one of these identifiers. An extension of this mechanism consists in associating both a kernel and a functional expression to the same identifier and to let the runtime system decide either to execute the kernel or to evaluate the expression. We show that this mechanism can be used to perform dynamic granularity adaptation.