Efficient sequential consistency using conditional fences

Authors:
Changhui Lin;Vijay Nagarajan;Rajiv Gupta
Affiliations:
CSE Department, University of California, Riverside, CA, USA;School of Informatics, University of Edinburgh, Edinburgh, United Kingdom;CSE Department, University of California, Riverside, CA, USA
Venue:
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Year:
2010

Citing 23
Cited 9

Efficient and correct execution of parallel programs that share memory

ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizing parallel programs with explicit synchronization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Analyses and optimizations for shared address space programs

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Is SC + ILP = RC?

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Hiding Relaxed Memory Consistency with a Compiler

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Shared Memory Consistency Models: A Tutorial

Computer
Dependence Analysis in Parallel Loops with i±k Subscripts

LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Speculative Sequential Consistency with Little Custom Storage

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Cooperating sequential processes

The origin of concurrent programming
Automatic fence insertion for shared memory multiprocessing

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Transactional Memory Coherence and Consistency

Proceedings of the 31st annual international symposium on Computer architecture
Compiler techniques for high performance sequentially consistent java programs

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Making Sequential Consistency Practical in Titanium

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Mechanisms for store-wait-free multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
BulkSC: bulk enforcement of sequential consistency

Proceedings of the 34th annual international symposium on Computer architecture
Practical escape analyses: how good are they?

Proceedings of the 3rd international conference on Virtual execution environments
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

IEEE Transactions on Computers
A Scalable, Non-blocking Approach to Transactional Memory

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Detecting and Eliminating Potential Violations of Sequential Consistency for Concurrent C/C++ Programs

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
InvisiFence: performance-transparent memory ordering in conventional multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
BulkCompiler: high-performance sequential consistency through cooperative compiler and hardware support

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture

There is nothing wrong with out-of-thin-air: compiler optimization and memory models

Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Location-based memory fences

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
A case for an SC-preserving compiler

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Efficient sequential consistency via conflict ordering

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
End-to-end sequential consistency

Proceedings of the 39th Annual International Symposium on Computer Architecture
Volition: scalable and precise sequential consistency violation detection

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Address-aware fences

Proceedings of the 27th international ACM conference on International conference on supercomputing
WeeFence: toward making fences free in TSO

Proceedings of the 40th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Among the various memory consistency models, the sequential consistency (SC) model, in which memory operations appear to take place in the order specified by the program, is the most intuitive and enables programmers to reason about their parallel programs the best. Nevertheless, processor designers often choose to support relaxed memory consistency models because the weaker ordering constraints imposed by such models allow for more instructions to be reordered and enable higher performance. Programs running on machines supporting weaker consistency models, can be transformed into ones in which SC is enforced. The compiler does this by computing a minimal set of memory access pairs whose ordering automatically guarantees SC. To ensure that these memory access pairs are not reordered, memory fences are inserted. Unfortunately, insertion of such memory fences can significantly slowdown the program. We observe that the ordering of the minimal set of memory accesses that the compiler strives to enforce, is typically already enforced in the normal course of program execution. A study we conducted on programs with compiler inserted memory fences shows that only 8% of the executed instances of the memory fences are really necessary to ensure SC. Motivated by this study we propose the conditional fence mechanism (C-Fence) that utilizes compiler information to decide dynamically if there is a need to stall at each fence. Our experiments with SPLASH-2 benchmarks show that, with C-Fences, programs can be transformed to enforce SC incurring only 12% slowdown, as opposed to 43% slowdown using normal fence instructions. Our approach requires very little hardware support (