End-to-end sequential consistency

Authors:
Abhayendra Singh;Satish Narayanasamy;Daniel Marino;Todd Millstein;Madanlal Musuvathi
Affiliations:
University of Michigan, Ann Arbor;University of Michigan, Ann Arbor;Symantec;University of California, Los Angeles;Microsoft Research, Redmond
Venue:
Proceedings of the 39th Annual International Symposium on Computer Architecture
Year:
2012

Citing 48
Cited 5

Efficient and correct execution of parallel programs that share memory

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiling programs with user parallelism

Selected papers of the second workshop on Languages and compilers for parallel computing
Detecting data races on weak memory systems

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Designing memory consistency models for shared-memory multiprocessors

Designing memory consistency models for shared-memory multiprocessors
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Source-level debugging of scalar optimized code

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Generating representative Web workloads for network and server performance evaluation

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Is SC + ILP = RC?

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A system-level specification framework for I/O architectures

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Weak ordering—a new definition

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Type-based race detection for Java

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Pointer and escape analysis for multithreaded programs

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
A parameterized type system for race-free Java programs

OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Ownership types for safe programming: preventing data races and deadlocks

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Multiprocessors Should Support Simple Memory-Consistency Models

Computer
Speculative Sequential Consistency with Little Custom Storage

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes

Proceedings of the 30th annual international symposium on Computer architecture
Stack allocation and synchronization optimizations for Java using escape analysis

ACM Transactions on Programming Languages and Systems (TOPLAS)
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Programming with transactional coherence and consistency (TCC)

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
The Java memory model

Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Compiler techniques for high performance sequentially consistent java programs

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Making Sequential Consistency Practical in Titanium

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Mechanisms for store-wait-free multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
BulkSC: bulk enforcement of sequential consistency

Proceedings of the 34th annual international symposium on Computer architecture
Automatically classifying benign and harmful data races using replay analysis

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Goldilocks: a race and transaction-aware java runtime

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

IEEE Transactions on Computers
Execution replay of multiprocessor virtual machines

Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Foundations of the C++ concurrency memory model

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
FastTrack: efficient and precise dynamic race detection

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Reactive NUCA: near-optimal block placement and replication in distributed caches

Proceedings of the 36th annual international symposium on Computer architecture
InvisiFence: performance-transparent memory ordering in conventional multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
SigRace: signature-based data race detection

Proceedings of the 36th annual international symposium on Computer architecture
BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
DRFX: a simple and efficient memory model for concurrent programming languages

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Conflict exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races

Proceedings of the 37th annual international symposium on Computer architecture
Subspace snooping: filtering snoops with operating system support

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Efficient sequential consistency using conditional fences

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Transactional Memory, 2nd Edition

Transactional Memory, 2nd Edition
Efficient processor support for DRFx, a memory model with exceptions

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
A case for an SC-preserving compiler

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks

Proceedings of the 38th annual international symposium on Computer architecture
Efficient sequential consistency via conflict ordering

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems

TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Fast RMWs for TSO: semantics and implementation

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Address-aware fences

Proceedings of the 27th international ACM conference on International conference on supercomputing
WeeFence: toward making fences free in TSO

Proceedings of the 40th Annual International Symposium on Computer Architecture
Fence-free work stealing on bounded TSO processors

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sequential consistency (SC) is arguably the most intuitive behavior for a shared-memory multithreaded program. It is widely accepted that language-level SC could significantly improve programmability of a multiprocessor system. However, efficiently supporting end-to-end SC remains a challenge as it requires that both compiler and hardware optimizations preserve SC semantics. While a recent study has shown that a compiler can preserve SC semantics for a small performance cost, an efficient and complexity-effective SC hardware remains elusive. Past hardware solutions relied on aggressive speculation techniques, which has not yet been realized in a practical implementation. This paper exploits the observation that hardware need not enforce any memory model constraints on accesses to thread-local and shared read-only locations. A processor can easily determine a large fraction of these safe accesses with assistance from static compiler analysis and the hardware memory management unit. We discuss a low-complexity hardware design that exploits this information to reduce the overhead in ensuring SC. Our design employs an additional unordered store buffer for fast-tracking thread-local stores and allowing later memory accesses to proceed without a memory ordering related stall. Our experimental study shows that the cost of guaranteeing end-to-end SC is only 6.2% on average when compared to a system with TSO hardware executing a stock compiler's output.