Address-aware fences

Authors:
Changhui Lin;Vijay Nagarajan;Rajiv Gupta
Affiliations:
CSE Department, University of California, Riverside, Riverside, CA, USA;School of Informatics, University of Edinburgh, Edinburgh, United Kingdom;CSE Department, University of California, Riverside, Riverside, CA, USA
Venue:
Proceedings of the 27th international ACM conference on International conference on supercomputing
Year:
2013

Citing 37
Cited 1

Efficient and correct execution of parallel programs that share memory

ACM Transactions on Programming Languages and Systems (TOPLAS)
Performance evaluation of memory consistency models for shared-memory multiprocessors

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Simple, fast, and practical non-blocking and blocking concurrent queue algorithms

PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Thin locks: featherweight synchronization for Java

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Is SC + ILP = RC?

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Specifying Concurrent Program Modules

ACM Transactions on Programming Languages and Systems (TOPLAS)
Introduction to algorithms

Introduction to algorithms
Hiding Relaxed Memory Consistency with a Compiler

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Speculative lock elision: enabling highly concurrent multithreaded execution

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Lock reservation: Java locks can mostly do without atomic operations

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Shared Memory Consistency Models: A Tutorial

Computer
Speculative Sequential Consistency with Little Custom Storage

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Cooperating sequential processes

The origin of concurrent programming
Automatic fence insertion for shared memory multiprocessing

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Scalable lock-free dynamic memory allocation

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
TO-Lock: Removing Lock Overhead Using the Owners' Temporal Locality

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Dynamic circular work-stealing deque

Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Conditional Memory Ordering

Proceedings of the 33rd annual international symposium on Computer Architecture
Mechanisms for store-wait-free multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
BulkSC: bulk enforcement of sequential consistency

Proceedings of the 34th annual international symposium on Computer architecture
CheckFence: checking consistency of concurrent data types on relaxed memory models

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Early experience with a commercial hardware transactional memory implementation

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Detecting and Eliminating Potential Violations of Sequential Consistency for Concurrent C/C++ Programs

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
InvisiFence: performance-transparent memory ordering in conventional multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
Adaptive Locks: Combining Transactions and Locks for Efficient Concurrency

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
DRFX: a simple and efficient memory model for concurrent programming languages

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Conflict exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races

Proceedings of the 37th annual international symposium on Computer architecture
Simple and fast biased locks

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Efficient sequential consistency using conditional fences

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Efficient processor support for DRFx, a memory model with exceptions

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Location-based memory fences

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
A Primer on Memory Consistency and Cache Coherence

A Primer on Memory Consistency and Cache Coherence
Efficient sequential consistency via conflict ordering

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Dynamic synthesis for relaxed memory models

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
End-to-end sequential consistency

Proceedings of the 39th Annual International Symposium on Computer Architecture

Fence-free work stealing on bounded TSO processors

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many modern multicore architectures support shared memory for ease of programming and relaxed memory models to deliver high performance. With relaxed memory models, memory accesses can be reordered dynamically and seen by other processors. Therefore, fence instructions are provided to enforce the memory orderings that are critical to the correctness of a program. However, fence instructions are costly as they cause the processor to stall. Prior works have observed that most of the executions of fence instructions are unnecessary. In this paper we propose address-aware fence, a hardware solution for reducing the overhead of fence instructions without resorting to speculation. Address-aware fence only enforces memory orderings that are necessary to maintain the effect that the traditional fence strives to enforce. This is achieved by dynamically checking a condition for when an execution of a fence must take effect and delay the memory accesses following the fence. When a fence instruction is encountered, first, necessary memory addresses are collected to form a watchlist, and then, only the memory accesses to addresses that are contained in the watchlist are delayed. The memory accesses whose addresses are not contained in the watchlist are allowed to complete without waiting for the completion of pending memory accesses from before the fence. Our experiments conducted on a group of concurrent lock-free algorithms and SPLASH-2 benchmarks show that address-aware fence eliminates nearly all the overhead due to fences and achieves an average improvement of 12.2\% on programs with traditional fences.