Exploring memory consistency for massively-threaded throughput-oriented processors

Authors:
Blake A. Hechtman;Daniel J. Sorin
Affiliations:
Duke University, Durham, NC;Duke University, Durham, NC
Venue:
Proceedings of the 40th Annual International Symposium on Computer Architecture
Year:
2013

Citing 30
Cited 1

Alpha architecture reference manual

Alpha architecture reference manual
The SPARC architecture manual (version 9)

The SPARC architecture manual (version 9)
Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Is SC + ILP = RC?

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Weak ordering—a new definition

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Speculative lock elision: enabling highly concurrent multithreaded execution

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Shared Memory Consistency Models: A Tutorial

Computer
Multiprocessors Should Support Simple Memory-Consistency Models

Computer
Temperature-aware microarchitecture

Proceedings of the 30th annual international symposium on Computer architecture
Microarchitecture Optimizations for Exploiting Memory-Level Parallelism

Proceedings of the 31st annual international symposium on Computer architecture
Memory Ordering: A Value-Based Approach

Proceedings of the 31st annual international symposium on Computer architecture
The Vector-Thread Architecture

Proceedings of the 31st annual international symposium on Computer architecture
The Java memory model

Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Dynamic Verification of Sequential Consistency

Proceedings of the 32nd annual international symposium on Computer Architecture
Memory Model = Instruction Reordering + Store Atomicity

Proceedings of the 33rd annual international symposium on Computer Architecture
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

IEEE Transactions on Computers
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
Foundations of the C++ concurrency memory model

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Rigel: an architecture and scalable programming interface for a 1000-core accelerator

Proceedings of the 36th annual international symposium on Computer architecture
A Better x86 Memory Model: x86-TSO

TPHOLs '09 Proceedings of the 22nd International Conference on Theorem Proving in Higher Order Logics
Rodinia: A benchmark suite for heterogeneous computing

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors

Communications of the ACM
Cohesion: a hybrid memory model for accelerators

Proceedings of the 37th annual international symposium on Computer architecture
Cuckoo directory: A scalable directory for many-core systems

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
The gem5 simulator

ACM SIGARCH Computer Architecture News
A Primer on Memory Consistency and Cache Coherence

A Primer on Memory Consistency and Cache Coherence
Hardware transactional memory for GPU architectures

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Why on-chip cache coherence is here to stay

Communications of the ACM
Cache coherence for GPU architectures

HPCA '13 Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)

Heterogeneous-race-free memory models

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We re-visit the issue of hardware consistency models in the new context of massively-threaded throughput-oriented processors (MTTOPs). A prominent example of an MTTOP is a GPGPU, but other examples include Intel's MIC architecture and some recent academic designs. MTTOPs differ from CPUs in many significant ways, including their ability to tolerate latency, their memory system organization, and the characteristics of the software they run. We compare implementations of various hardware consistency models for MTTOPs in terms of performance, energy-efficiency, hardware complexity, and programmability. Our results show that the choice of hardware consistency model has a surprisingly minimal impact on performance and thus the decision should be based on hardware complexity, energy-efficiency, and programmability. For many MTTOPs, it is likely that even a simple implementation of sequential consistency is attractive.