An asymmetric distributed shared memory model for heterogeneous parallel systems

  • Authors:
  • Isaac Gelado;John E. Stone;Javier Cabezas;Sanjay Patel;Nacho Navarro;Wen-mei W. Hwu

  • Affiliations:
  • Universitat Politecnica de Catalunya, Barcelona, Spain;University of Illinois, Urbana-Champaign, IL, USA;Universitat Politecnica de Catalunya, Barcelona, Spain;University of Illinois, Urbana-Champaign, IL, USA;Universitat Politecnica de Catalunya, Barcelona, Spain;University of Illinois, Urbana-Champaign, IL, USA

  • Venue:
  • Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Heterogeneous computing combines general purpose CPUs with accelerators to efficiently execute both sequential control-intensive and data-parallel phases of applications. Existing programming models for heterogeneous computing rely on programmers to explicitly manage data transfers between the CPU system memory and accelerator memory. This paper presents a new programming model for heterogeneous computing, called Asymmetric Distributed Shared Memory (ADSM), that maintains a shared logical memory space for CPUs to access objects in the accelerator physical memory but not vice versa. The asymmetry allows light-weight implementations that avoid common pitfalls of symmetrical distributed shared memory systems. ADSM allows programmers to assign data objects to performance critical methods. When a method is selected for accelerator execution, its associated data objects are allocated within the shared logical memory space, which is hosted in the accelerator physical memory and transparently accessible by the methods executed on CPUs. We argue that ADSM reduces programming efforts for heterogeneous computing systems and enhances application portability. We present a software implementation of ADSM, called GMAC, on top of CUDA in a GNU/Linux environment. We show that applications written in ADSM and running on top of GMAC achieve performance comparable to their counterparts using programmer-managed data transfers. This paper presents the GMAC system and evaluates different design choices. We further suggest additional architectural support that will likely allow GMAC to achieve higher application performance than the current CUDA model.