ViperVM: a runtime system for parallel functional high-performance computing on heterogeneous architectures

Authors:
Sylvain Henry
Affiliations:
University of Bordeaux, Bordeaux, France
Venue:
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Year:
2013

Citing 31
Cited 0

Implementing functional languages

Implementing functional languages
State in Haskell

Lisp and Symbolic Computation - Special issue on state in programming languages (part I)
The design, implementation, and evaluation of Jade

ACM Transactions on Programming Languages and Systems (TOPLAS)
Functional Programming and Parallel Graph Rewriting

Functional Programming and Parallel Graph Rewriting
Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing

IEEE Transactions on Parallel and Distributed Systems
Automatic task graph generation techniques

HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
A Task Duplication Based Scheduling Algorithm for Heterogeneous Systems

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Programming graphics processors functionally

Haskell '04 Proceedings of the 2004 ACM SIGPLAN workshop on Haskell
Metaprogramming GPUs with Sh

Metaprogramming GPUs with Sh
Parallel functional programming in Eden

Journal of Functional Programming
Accelerator: using data parallelism to program GPUs for general-purpose uses

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
PetaBricks: a language and compiler for algorithmic choice

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Hierarchical Task-Based Programming With StarSs

International Journal of High Performance Computing Applications
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
DAG Scheduling Using a Lookahead Variant of the Heterogeneous Earliest Finish Time Algorithm

PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Nikola: embedding compiled GPU functions in Haskell

Proceedings of the third ACM Haskell symposium on Haskell
Accelerating Haskell array codes with multicore GPUs

Proceedings of the sixth workshop on Declarative aspects of multicore programming
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures

Concurrency and Computation: Practice & Experience - Euro-Par 2009
RapidMind: portability across architectures and its limitations

Facing the multicore-challenge
Obsidian: a domain specific embedded language for parallel programming of graphics processors

IFL'08 Proceedings of the 20th international conference on Implementation and application of functional languages
DAGuE: A generic distributed DAG engine for High Performance Computing

Parallel Computing
A Heterogeneous Parallel Framework for Domain-Specific Languages

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Enabling task-level scheduling on heterogeneous platforms

Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Parallel and concurrent programming in Haskell

CEFP'11 Proceedings of the 4th Summer School conference on Central European Functional Programming School
Parallel programming in Haskell almost for free: an embedding of intel's array building blocks

Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Load balancing in a changing world: dealing with heterogeneity and performance variability

Proceedings of the ACM International Conference on Computing Frontiers
Portable mapping of data parallel programs to OpenCL for heterogeneous systems

CGO '13 Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
Optimising purely functional GPU programs

Proceedings of the 18th ACM SIGPLAN international conference on Functional programming
XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures

IPDPS '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The current trend in high-performance computing is to use heterogeneous architectures (i.e. multi-core with accelerators such as GPUs or Xeon Phi) because they offer very good performance over energy consumption ratios. Programming these architectures is notoriously hard, hence their use is still somewhat restricted to parallel programming experts. The situation is improving with frameworks using high-level programming models to generate efficient computation kernels for these new accelerator architectures. However, an orthogonal issue is to efficiently manage memory and kernel scheduling especially on architectures containing multiple accelerators. Task graph based runtime systems have been a first step toward efficiently automatizing these tasks. However they introduce new challenges of their own such as task granularity adaptation that cannot be easily automatized. In this paper, we present a programming model and a preliminary implementation of a runtime system called ViperVM that takes advantage of parallel functional programming to extend task graph based runtime systems. The main idea is to substitute dynamically created task graphs with pure functional programs that are evaluated in parallel by the runtime system. Programmers can associate kernels (written in OpenCL, CUDA, Fortran...) to identifiers that can then be used as pure functions in programs. During parallel evaluation, the runtime system automatically schedules kernels on available accelerators when it has to reduce one of these identifiers. An extension of this mechanism consists in associating both a kernel and a functional expression to the same identifier and to let the runtime system decide either to execute the kernel or to evaluate the expression. We show that this mechanism can be used to perform dynamic granularity adaptation.