Memory Consistency Models for Shared-Memory Multiprocessors

Authors:
Kourosh Gharachorloo
Affiliations:
-
Venue:
Memory Consistency Models for Shared-Memory Multiprocessors
Year:
1995

Citing 0
Cited 11

Specifying Java thread semantics using a uniform memory model

JGI '02 Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande
Shared Memory Consistency Models: A Tutorial

Computer
The complexity of verifying memory coherence

Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
A Framework for Formalization and Strictness Analysis of Simulation Event Orderings

Simulation
The Complexity of Verifying Memory Coherence and Consistency

IEEE Transactions on Parallel and Distributed Systems
Introducing technology into the Linux kernel: a case study

ACM SIGOPS Operating Systems Review - Research and developments in the Linux kernel
x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors

Communications of the ACM
Understanding POWER multiprocessors

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Incremental dynamic updates with first-class contexts

TOOLS'12 Proceedings of the 50th international conference on Objects, Models, Components, Patterns
A formal hierarchy of weak memory models

Formal Methods in System Design
Fast asymmetric thread synchronization

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers

Quantified Score

Hi-index	0.02

Visualization

Abstract

The memory consistency model for a shared-memory multiprocessor specifies the behavior of memory with respect to read and write operations from multiple processors. As such, the memory model influences many aspects of system design, including the design of programming languages, compilers, and the underlying hardware. Relaxed models that impose fewer memory ordering constraints offer the potential for higher performance by allowing hardware and software to overlap and reorder memory operations. However, fewer ordering guarantees can compromise programmability and portability. Many of the previously proposed models either fail to provide reasonable programming semantics or are biased toward programming ease at the cost of sacrificing performance. Furthermore, the lack of consensus on an acceptable model hinders software portability across different systems. This dissertation focuses on providing a balanced solution that directly addresses the trade-off between programming ease and performance. To address programmability, we propose an alternative method for specifying memory behavior that presents a higher level abstraction to the programmer. We show that with only a few types of information supplied by the programmer, an implementation can exploit the full range of optimizations enabled by previous models. Furthermore, the same information enables automatic and efficient portability across a wide range of implementations. To expose the optimizations enabled by a model, we have developed a formal framework for specifying the low-level ordering constraints that must be enforced by an implementation. Based on these specifications, we present a wide range of architecture and compiler implementation techniques for efficiently supporting a given model. Finally, we evaluate the performance benefits of exploiting relaxed models based on detailed simulations of realistic parallel applications. Our results show that the optimizations enabled by relaxed models are extremely effective in hiding virtually the full latency of writes in architectures with blocking reads (i.e., processor stalls on reads), with gains as high as 80\%. Architectures with non-blocking reads can further exploit relaxed models to hide a substantial fraction of the read latency as well, leading to a larger overall performance benefit. Furthermore, these optimizations complement gains from other latency hiding techniques such as prefetching and multiple contexts. We believe that the combined benefits in hardware and software will make relaxed models universal in future multiprocessors, as is already evidenced by their adoption in several commercial systems.