MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs

  • Authors:
  • John A. Stratton;Sam S. Stone;Wen-Mei W. Hwu

  • Affiliations:
  • Center for Reliable and High-Performance Computing and Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign,;Center for Reliable and High-Performance Computing and Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign,;Center for Reliable and High-Performance Computing and Department of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign,

  • Venue:
  • Languages and Compilers for Parallel Computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

CUDA is a data parallel programming model that supports several key abstractions - thread blocks, hierarchical memory and barrier synchronization - for writing applications. This model has proven effective in programming GPUs. In this paper we describe a framework called MCUDA, which allows CUDA programs to be executed efficiently on shared memory, multi-core CPUs. Our framework consists of a set of source-level compiler transformations and a runtime system for parallel execution. Preserving program semantics, the compiler transforms threaded SPMD functions into explicit loops, performs fission to eliminate barrier synchronizations, and converts scalar references to thread-local data to replicated vector references. We describe an implementation of this framework and demonstrate performance approaching that achievable from manually parallelized and optimized C code. With these results, we argue that CUDA can be an effective data-parallel programming model for more than just GPU architectures.