Software Transactional Memory for GPU Architectures

  • Authors:
  • Yunlong Xu;Rui Wang;Nilanjan Goswami;Tao Li;Lan Gao;Depei Qian

  • Affiliations:
  • School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China;School of Computer Science and Engineering, Beihang University, Beijing, China;ECE Department, University of Florida, Gainesville, USA;ECE Department, University of Florida, Gainesville, USA;School of Computer Science and Engineering, Beihang University, Beijing, China;School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China and School of Computer Science and Engineering, Beihang University, Beijing, China

  • Venue:
  • Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern GPUs have shown promising results in accelerating computation intensive and numerical workloads with limited dynamic data sharing. However, many real-world applications manifest ample amount of data sharing among concurrently executing threads. Often data sharing requires mutual exclusion mechanism to ensure data integrity in multithreaded environment. Although modern GPUs provide atomic primitives that can be leveraged to construct fine-grained locks, lock-based synchronization requires significant programming efforts to achieve functional correctness. The massive multithreading and SIMT execution paradigm of GPUs further extend the challenges of GPU locks. To make applications with dynamic data sharing benefit from GPU acceleration, we propose a novel software transactional memory system for GPU architectures (GPU-STM). The major challenges include ensuring good scalability with respect to the massive multithreading of GPUs, and preventing livelocks caused by the SIMT execution paradigm of GPUs. To this end, we propose (1) a hierarchical validation technique and (2) an encounter-time lock-sorting mechanism to deal with the two challenges, respectively. We build our GPU-STM prototype based on the commercially available GPU platform and runtime. Our real system based evaluation shows that GPU-STM outperforms coarse-grain locks on GPUs by up to 20x.