Heterogeneous-race-free memory models

  • Authors:
  • Derek R. Hower;Blake A. Hechtman;Bradford M. Beckmann;Benedict R. Gaster;Mark D. Hill;Steven K. Reinhardt;David A. Wood

  • Affiliations:
  • AMD, Bellevue, WA, USA;Duke University, Durham, NC, USA;AMD, Bellevue, WA, USA;AMD, Sunnyvale, CA, USA;University of Wisconsin-Madison, Madison, WI, USA;AMD, Bellevue, WA, USA;University of Wisconsin-Madison, Madison, WI, USA

  • Venue:
  • Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Commodity heterogeneous systems (e.g., integrated CPUs and GPUs), now support a unified, shared memory address space for all components. Because the latency of global communication in a heterogeneous system can be prohibi-tively high, heterogeneous systems (unlike homogeneous CPU systems) provide synchronization mechanisms that only guarantee ordering among a subset of threads, which we call a scope. Unfortunately, the consequences and se-mantics of these scoped operations are not yet well under-stood. Without a formal and approachable model to reason about the behavior of these operations, we risk an array of portability and performance issues. In this paper, we embrace scoped synchronization with a new class of memory consistency models that add scoped synchronization to data-race-free models like those of C++ and Java. Called sequential consistency for heterogeneous-race-free (SC for HRF), the new models guarantee SC for programs with "sufficient" synchronization (no data races) of "sufficient" scope. We discuss two such models. The first, HRF-direct, works well for programs with highly regular parallelism. The second, HRF-indirect, builds on HRF-direct by allowing synchronization using different scopes in some cases involving transitive communication. We quanti-tatively show that HRF-indirect encourages forward-looking programs with irregular parallelism by showing up to a 10% performance increase in a task runtime for GPUs.