Fast asymmetric thread synchronization

Authors:
Jimmy Cleary;Owen Callanan;Mark Purcell;David Gregg
Affiliations:
Trinity College Dublin, Dublin, Ireland;IBM Research, Dublin, Ireland;IBM Research, Dublin, Ireland;Trinity College Dublin and Irish Software Engineering Research Centre (LERO), Dublin, Ireland
Venue:
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Year:
2013

Citing 16
Cited 1

Efficient synchronization of multiprocessors with shared memory

ACM Transactions on Programming Languages and Systems (TOPLAS)
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Thin locks: featherweight synchronization for Java

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Lock reservation: Java locks can mostly do without atomic operations

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Memory Consistency Models for Shared-Memory Multiprocessors

Memory Consistency Models for Shared-Memory Multiprocessors
Eliminating synchronization-related atomic operations with biased locking and bulk rebiasing

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Predictive log-synchronization

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Flat combining and the synchronization-parallelism tradeoff

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Introduction to the wire-speed processor and architecture

IBM Journal of Research and Development
Simple and fast biased locks

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Parallelization of Snort on a multi-core platform

Proceedings of the 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
AC: composable asynchronous IO for native languages

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Network intrusion detection

IEEE Network: The Magazine of Global Internetworking
Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications

USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference

Leveraging hardware message passing for efficient thread synchronization

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

For most multi-threaded applications, data structures must be shared between threads. Ensuring thread safety on these data structures incurs overhead in the form of locking and other synchronization mechanisms. Where data is shared among multiple threads these costs are unavoidable. However, a common access pattern is that data is accessed primarily by one dominant thread, and only very rarely by the other, non-dominant threads. Previous research has proposed biased locks, which are optimized for a single dominant thread, at the cost of greater overheads for non-dominant threads. In this article we propose a new family of biased synchronization mechanisms that, using a modified interface, push accesses to shared data from the non-dominant threads to the dominant one, via a novel set of message passing mechanisms. We present mechanisms for protecting critical sections, for queueing work, for caching shared data in registers where it is safe to do so, and for asynchronous critical section accesses. We present results for the conventional Intel® Sandy Bridge processor and for the emerging network-optimized many-core IBM® PowerEN™ processor. We find that our algorithms compete well with existing biased locking algorithms, and, in particular, perform better than existing algorithms as accesses from non-dominant threads increase.