Declarative data-parallel programming with the accelerator system

Authors:
Satnam Singh
Affiliations:
Microsoft, Cambridge, United Kingdom
Venue:
Proceedings of the 5th ACM SIGPLAN workshop on Declarative aspects of multicore programming
Year:
2010

Citing 0
Cited 2

Adaptive data parallelism for internet clients on heterogeneous platforms

Proceedings of the 8th symposium on Dynamic languages
River trail: a path to parallelism in JavaScript

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Accelerator project at Microsoft Research is developing a data-parallel library which provides a high level and accessible mechanism for producing code that executes on GPUs (via DirectX) and X64 multi-cores using SIMD instructions. An experimental target can also produced VHDL netlists which can be implemented on FPGA circuits. Although the library is developed in a mainstream imperative language the user programs in what is essentially a functional embedded domain specific language. The library provides data-parallel arrays and data-parallel operations e.g. element-wise operations, reductions, and matrix transformations. It is also possible to layer higher level domain specific data-parallel languages on top of Accelerator e.g. parallel bitonic sorters and mergers (e.g. Batcher's) have been expressed in a combinator based library in F# which has appealing properties for composing computations through the use of higher order functions. A key distinction between the Accelerator approach for generating GPU code and the CUDA path supported by NVidia is that Accelerator works on-line by jit-ing rather than off-line by generating programs that need to be further compiled and executed. This greatly simplifies to usage model for the programmer. The circuit generator target for Accelerator cannot work by ji-ting so it works in off-line mode. The ability to target three quite different architectures (GPUs, multi-core SIMD instructions and FPGAs) is possible due to the careful design of the Accelerator library by picking just the right level of abstraction for the data and its associated data-parallel operations. A series of examples have been developed including applications for image processing and motion estimation.