One stone two birds: synchronization relaxation and redundancy removal in GPU-CPU translation

  • Authors:
  • Ziyu Guo;Bo Wu;Xipeng Shen

  • Affiliations:
  • Qualcomm CDMA Technologies, San Diego, CA, USA;College of William and Mary, Williamsburg, VA, USA;College of William and Mary, Williamsburg, VA, USA

  • Venue:
  • Proceedings of the 26th ACM international conference on Supercomputing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

As an approach to promoting whole-system synergy on a heterogeneous computing system, compilation of fine-grained SPMD-threaded code(e.g., GPU CUDA code) for multicore CPU has drawn some recent attentions. This paper concentrates on two important sources of inefficiency that limit existing translators. The first is overly strong synchronizations; the second is thread-level partially redundant computations. In this paper, we point out that both kinds of inefficiency essentially come from a single reason: the non-uniformity among threads. Based on that observation, we present a thread-level dependence analysis, which leads to a code generator with three novel features: an instance-level instruction scheduler for synchronization relaxation, a graph pattern recognition scheme for code shape optimization, and a fine-grained analysis for thread-level partial redundancy removal. Experiments show that the unified solution is effective in resolving both inefficiencies, yielding speedup as much as a factor of 14.