Implementing functional languages
Implementing functional languages
Lisp and Symbolic Computation - Special issue on state in programming languages (part I)
The design, implementation, and evaluation of Jade
ACM Transactions on Programming Languages and Systems (TOPLAS)
Functional Programming and Parallel Graph Rewriting
Functional Programming and Parallel Graph Rewriting
Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing
IEEE Transactions on Parallel and Distributed Systems
Automatic task graph generation techniques
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
A Task Duplication Based Scheduling Algorithm for Heterogeneous Systems
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Programming graphics processors functionally
Haskell '04 Proceedings of the 2004 ACM SIGPLAN workshop on Haskell
Metaprogramming GPUs with Sh
Parallel functional programming in Eden
Journal of Functional Programming
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
PetaBricks: a language and compiler for algorithmic choice
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Hierarchical Task-Based Programming With StarSs
International Journal of High Performance Computing Applications
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
DAG Scheduling Using a Lookahead Variant of the Heterogeneous Earliest Finish Time Algorithm
PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Nikola: embedding compiled GPU functions in Haskell
Proceedings of the third ACM Haskell symposium on Haskell
Accelerating Haskell array codes with multicore GPUs
Proceedings of the sixth workshop on Declarative aspects of multicore programming
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
Concurrency and Computation: Practice & Experience - Euro-Par 2009
RapidMind: portability across architectures and its limitations
Facing the multicore-challenge
Obsidian: a domain specific embedded language for parallel programming of graphics processors
IFL'08 Proceedings of the 20th international conference on Implementation and application of functional languages
DAGuE: A generic distributed DAG engine for High Performance Computing
Parallel Computing
A Heterogeneous Parallel Framework for Domain-Specific Languages
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Enabling task-level scheduling on heterogeneous platforms
Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units
Intel's Array Building Blocks: A retargetable, dynamic compiler and embedded language
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Parallel and concurrent programming in Haskell
CEFP'11 Proceedings of the 4th Summer School conference on Central European Functional Programming School
Parallel programming in Haskell almost for free: an embedding of intel's array building blocks
Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Load balancing in a changing world: dealing with heterogeneity and performance variability
Proceedings of the ACM International Conference on Computing Frontiers
Portable mapping of data parallel programs to OpenCL for heterogeneous systems
CGO '13 Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
Optimising purely functional GPU programs
Proceedings of the 18th ACM SIGPLAN international conference on Functional programming
XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures
IPDPS '13 Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing
Hi-index | 0.00 |
The current trend in high-performance computing is to use heterogeneous architectures (i.e. multi-core with accelerators such as GPUs or Xeon Phi) because they offer very good performance over energy consumption ratios. Programming these architectures is notoriously hard, hence their use is still somewhat restricted to parallel programming experts. The situation is improving with frameworks using high-level programming models to generate efficient computation kernels for these new accelerator architectures. However, an orthogonal issue is to efficiently manage memory and kernel scheduling especially on architectures containing multiple accelerators. Task graph based runtime systems have been a first step toward efficiently automatizing these tasks. However they introduce new challenges of their own such as task granularity adaptation that cannot be easily automatized. In this paper, we present a programming model and a preliminary implementation of a runtime system called ViperVM that takes advantage of parallel functional programming to extend task graph based runtime systems. The main idea is to substitute dynamically created task graphs with pure functional programs that are evaluated in parallel by the runtime system. Programmers can associate kernels (written in OpenCL, CUDA, Fortran...) to identifiers that can then be used as pure functions in programs. During parallel evaluation, the runtime system automatically schedules kernels on available accelerators when it has to reduce one of these identifiers. An extension of this mechanism consists in associating both a kernel and a functional expression to the same identifier and to let the runtime system decide either to execute the kernel or to evaluate the expression. We show that this mechanism can be used to perform dynamic granularity adaptation.