VForce: An environment for portable applications on high performance systems with accelerators

  • Authors:
  • Nicholas Moore;Miriam Leeser;Laurie Smith King

  • Affiliations:
  • Department of Electrical and Computer Engineering, Northeastern University, 02115 Boston, MA, United States;Department of Electrical and Computer Engineering, Northeastern University, 02115 Boston, MA, United States;Department of Mathematics and Computer Science, College of the Holy Cross, Worcester, MA, United States

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Special Purpose Processors (SPPs), including Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs), are increasingly being used to accelerate scientific applications. VForce aims to aid application programmers in using such accelerators with minimal changes in user code. VForce is an extensible middleware framework that enables VSIPL++ (the Vector Signal Image Processing Library extension) programs to transparently use Special Purpose Processors (SPPs) while maintaining portability across platforms with and without SPP hardware. The framework is designed to maintain a VSIPL++-like environment and hide hardware-specific details from the application programmer while preserving performance and productivity. VForce focuses on the interface between application code and accelerator code. The same application code can run in software on a general purpose processor or take advantage of SPPs if they are available. VForce is unique in that it supports calls to both FPGAs and GPUs while requiring no changes in user code. Results on systems with NVIDIA Tesla GPUs and Xilinx FPGAs are presented. This paper describes VForce, illustrates its support for portability, and discusses lessons learned for providing support for different hardware configurations at run time. Key considerations involve global knowledge about the relationship between processing steps for defining application mapping, memory allocation, and task parallelism.