An Extension of the StarSs Programming Model for Platforms with Multiple GPUs

  • Authors:
  • Eduard Ayguadé;Rosa M. Badia;Francisco D. Igual;Jesús Labarta;Rafael Mayo;Enrique S. Quintana-Ortí

  • Affiliations:
  • Barcelona Supercomputing Center --- Centro Nacional de Supercomputación, (BSC---CNS) and Universitat Politècnica de Catalunya, Barcelona, Spain 08034;Barcelona Supercomputing Center --- Centro Nacional de Supercomputación, (BSC---CNS) and Universitat Politècnica de Catalunya, Barcelona, Spain 08034 and Consejo Superior de Investigacio ...;Depto. de Ingeniería y Ciencia de Computadores, Universidad Jaume I (UJI), Castellón, Spain 12.071;Barcelona Supercomputing Center --- Centro Nacional de Supercomputación, (BSC---CNS) and Universitat Politècnica de Catalunya, Barcelona, Spain 08034;Depto. de Ingeniería y Ciencia de Computadores, Universidad Jaume I (UJI), Castellón, Spain 12.071;Depto. de Ingeniería y Ciencia de Computadores, Universidad Jaume I (UJI), Castellón, Spain 12.071

  • Venue:
  • Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

While general-purpose homogeneous multi-core architectures are becoming ubiquitous, there are clear indications that, for a number of important applications, a better performance/power ratio can be attained using specialized hardware accelerators. These accelerators require specific SDK or programming languages which are not always easy to program. Thus, the impact of the new programming paradigms on the programmer's productivity will determine their success in the high-performance computing arena. In this paper we present GPU Superscalar (GPUSs), an extension of the Star Superscalar programming model that targets the parallelization of applications on platforms consisting of a general-purpose processor connected with multiple graphics processors. GPUSs deals with architecture heterogeneity and separate memory address spaces, while preserving simplicity and portability. Preliminary experimental results for a well-known operation in numerical linear algebra illustrate the correct adaptation of the runtime to a multi-GPU system, attaining notable performance results.