WeeFence: toward making fences free in TSO

Authors:
Yuelu Duan;Abdullah Muzahid;Josep Torrellas
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign
Venue:
Proceedings of the 40th Annual International Symposium on Computer Architecture
Year:
2013

Citing 24
Cited 1

Efficient and correct execution of parallel programs that share memory

ACM Transactions on Programming Languages and Systems (TOPLAS)
The SPARC architecture manual (version 9)

The SPARC architecture manual (version 9)
Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Is SC + ILP = RC?

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Speculative lock elision: enabling highly concurrent multithreaded execution

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Speculative synchronization: applying thread-level speculation to explicitly parallel applications

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Hiding Relaxed Memory Consistency with Compilers

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Compiler techniques for high performance sequentially consistent java programs

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Mechanisms for store-wait-free multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
BulkSC: bulk enforcement of sequential consistency

Proceedings of the 34th annual international symposium on Computer architecture
CheckFence: checking consistency of concurrent data types on relaxed memory models

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs

IEEE Transactions on Computers
Foundations of the C++ concurrency memory model

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Detecting and Eliminating Potential Violations of Sequential Consistency for Concurrent C/C++ Programs

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
InvisiFence: performance-transparent memory ordering in conventional multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
x86-TSO: a rigorous and usable programmer's model for x86 multiprocessors

Communications of the ACM
Efficient sequential consistency using conditional fences

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Sound and complete monitoring of sequential consistency for relaxed memory models

TACAS'11/ETAPS'11 Proceedings of the 17th international conference on Tools and algorithms for the construction and analysis of systems: part of the joint European conferences on theory and practice of software
A case for an SC-preserving compiler

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Efficient sequential consistency via conflict ordering

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
End-to-end sequential consistency

Proceedings of the 39th Annual International Symposium on Computer Architecture
Volition: scalable and precise sequential consistency violation detection

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Fence-free work stealing on bounded TSO processors

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although fences are designed for low-overhead concurrency coordination, they can be expensive in current machines. If fences were largely free, faster fine-grained concurrent algorithms could be devised, and compilers could guarantee Sequential Consistency (SC) at little cost. In this paper, we present WeeFence (or WFence for short), a fence that is very cheap because it allows post-fence accesses to skip it. Such accesses can typically complete and retire before the pre-fence writes have drained from the write buffer. Only when an incorrect reordering of accesses is about to happen, does the hardware stall to prevent it. In the paper, we present the WFence design for TSO, and compare it to a conventional fence with speculation for 8-processor multicore simulations. We run parallel kernels that contain explicit fences and parallel applications that do not. For the kernels, WFence eliminates nearly all of the fence stall, reducing the kernels' execution time by an average of 11%. For the applications, a conservative compiler algorithm places fences in the code to guarantee SC. In this case, on average, WFences reduce the resulting fence overhead from 38% of the applications' execution time to 2% (in a centralized WFence design), or from 36% to 5% (in a distributed WFence design).