Specifying Java thread semantics using a uniform memory model
JGI '02 Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande
The complexity of verifying memory coherence
Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
The Complexity of Verifying Memory Coherence and Consistency
IEEE Transactions on Parallel and Distributed Systems
Introducing technology into the Linux kernel: a case study
ACM SIGOPS Operating Systems Review - Research and developments in the Linux kernel
x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors
Communications of the ACM
Understanding POWER multiprocessors
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Incremental dynamic updates with first-class contexts
TOOLS'12 Proceedings of the 50th international conference on Objects, Models, Components, Patterns
A formal hierarchy of weak memory models
Formal Methods in System Design
Fast asymmetric thread synchronization
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Hi-index | 0.02 |
The memory consistency model for a shared-memory multiprocessor specifies the behavior of memory with respect to read and write operations from multiple processors. As such, the memory model influences many aspects of system design, including the design of programming languages, compilers, and the underlying hardware. Relaxed models that impose fewer memory ordering constraints offer the potential for higher performance by allowing hardware and software to overlap and reorder memory operations. However, fewer ordering guarantees can compromise programmability and portability. Many of the previously proposed models either fail to provide reasonable programming semantics or are biased toward programming ease at the cost of sacrificing performance. Furthermore, the lack of consensus on an acceptable model hinders software portability across different systems. This dissertation focuses on providing a balanced solution that directly addresses the trade-off between programming ease and performance. To address programmability, we propose an alternative method for specifying memory behavior that presents a higher level abstraction to the programmer. We show that with only a few types of information supplied by the programmer, an implementation can exploit the full range of optimizations enabled by previous models. Furthermore, the same information enables automatic and efficient portability across a wide range of implementations. To expose the optimizations enabled by a model, we have developed a formal framework for specifying the low-level ordering constraints that must be enforced by an implementation. Based on these specifications, we present a wide range of architecture and compiler implementation techniques for efficiently supporting a given model. Finally, we evaluate the performance benefits of exploiting relaxed models based on detailed simulations of realistic parallel applications. Our results show that the optimizations enabled by relaxed models are extremely effective in hiding virtually the full latency of writes in architectures with blocking reads (i.e., processor stalls on reads), with gains as high as 80\%. Architectures with non-blocking reads can further exploit relaxed models to hide a substantial fraction of the read latency as well, leading to a larger overall performance benefit. Furthermore, these optimizations complement gains from other latency hiding techniques such as prefetching and multiple contexts. We believe that the combined benefits in hardware and software will make relaxed models universal in future multiprocessors, as is already evidenced by their adoption in several commercial systems.