Efficient sequential consistency via conflict ordering

Authors:
Changhui Lin;Vijay Nagarajan;Rajiv Gupta;Bharghava Rajaram
Affiliations:
University of California Riverside, Riverside, CA, USA;University of Edinburgh, Edinburgh, United Kingdom;University of California Riverside, Riverside, CA, USA;University of Edinburgh, Edinburgh, United Kingdom
Venue:
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Year:
2012

Citing 28
Cited 6

Correct memory operation of cache-based multiprocessors

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Efficient and correct execution of parallel programs that share memory

ACM Transactions on Programming Languages and Systems (TOPLAS)
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Is SC + ILP = RC?

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Weak ordering—a new definition

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Hiding Relaxed Memory Consistency with a Compiler

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Shared Memory Consistency Models: A Tutorial

Computer
Multiprocessors Should Support Simple Memory-Consistency Models

Computer
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Speculative Sequential Consistency with Little Custom Storage

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Automatic fence insertion for shared memory multiprocessing

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Mechanisms for store-wait-free multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
BulkSC: bulk enforcement of sequential consistency

Proceedings of the 34th annual international symposium on Computer architecture
Goldilocks: a race and transaction-aware java runtime

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

IEEE Transactions on Computers
A Scalable, Non-blocking Approach to Transactional Memory

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Race directed random testing of concurrent programs

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
InvisiFence: performance-transparent memory ordering in conventional multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors

Communications of the ACM
Memory models: a case for rethinking parallel languages and hardware

Communications of the ACM
DRFX: a simple and efficient memory model for concurrent programming languages

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Conflict exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races

Proceedings of the 37th annual international symposium on Computer architecture
Efficient sequential consistency using conditional fences

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Efficient processor support for DRFx, a memory model with exceptions

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
A case for an SC-preserving compiler

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Implicit transactional memory in kilo-instruction multiprocessors

ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture

End-to-end sequential consistency

Proceedings of the 39th Annual International Symposium on Computer Architecture
Volition: scalable and precise sequential consistency violation detection

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Fast RMWs for TSO: semantics and implementation

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Address-aware fences

Proceedings of the 27th international ACM conference on International conference on supercomputing
WeeFence: toward making fences free in TSO

Proceedings of the 40th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although the sequential consistency (SC) model is the most intuitive, processor designers often choose to support relaxed memory consistency models for higher performance. This is because SC implementations that match the performance of relaxed memory models require post-retirement speculation and its associated hardware costs. In this paper we propose an efficient approach for enforcing SC without requiring post-retirement speculation. While prior SC implementations guarantee SC by explicitly completing memory operations within a processor in program order, we guarantee SC by completing conflicting memory operations, within and across processors, in an order that is consistent with the program order. More specifically, we identify those conflicting memory operations whose ordering is critical for the maintenance of SC and explicitly order them. This allows us to safely (non-speculatively) complete memory operations past pending writes, thus reducing memory ordering stalls. Our experiments with SPLASH-2 programs show that SC can be achieved efficiently, with performance comparable to RMO (relaxed memory order).