Remote core locking: migrating critical-section execution to improve the performance of multithreaded applications

Authors:
Jean-Pierre Lozi;Florian David;Gaël Thomas;Julia Lawall;Gilles Muller
Affiliations:
LIP6, INRIA;LIP6, INRIA;LIP6, INRIA;LIP6, INRIA;LIP6, INRIA
Venue:
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Year:
2012

Citing 24
Cited 6

Adaptive backoff synchronization techniques

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Synchronization without contention

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Optimal strategies for spinning and blocking

Journal of Parallel and Distributed Computing
Thin locks: featherweight synchronization for Java

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Tornado: maximizing locality and concurrency in a shared memory multiprocessor operating system

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Refactoring: improving the design of existing code

Refactoring: improving the design of existing code
Scalable queue-based spin locks with timeout

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Automatic measurement of memory hierarchy parameters

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Split-ordered lists: Lock-free extensible hash tables

Journal of the ACM (JACM)
Documenting and automating collateral evolutions in linux device drivers

Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
Accelerating critical section execution with asymmetric multi-core architectures

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
The multikernel: a new OS architecture for scalable multicore systems

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Helios: heterogeneous multiprocessing with satellite kernels

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
The Art of Multiprocessor Programming

The Art of Multiprocessor Programming
Decoupling contention management from scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Flat combining and the synchronization-parallelism tradeoff

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Exploring the limits of disjoint access parallelism

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Corey: an operating system for many cores

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Data-oriented transaction execution

Proceedings of the VLDB Endowment
Ad hoc synchronization considered harmful

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
GLocks: Efficient Support for Highly-Contended Locks in Many-Core CMPs

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Preemption adaptivity in time-published queue-based spin locks

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Fully-adaptive algorithms for long-lived renaming

DISC'06 Proceedings of the 20th international conference on Distributed Computing

Fast asymmetric thread synchronization

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A study of the scalability of stop-the-world garbage collectors on multicores

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Everything you always wanted to know about synchronization but were afraid to ask

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Leveraging hardware message passing for efficient thread synchronization

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Lock contention aware thread migrations

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

The scalability of multithreaded applications on current multicore systems is hampered by the performance of lock algorithms, due to the costs of access contention and cache misses. In this paper, we propose a new lock algorithm, Remote Core Locking (RCL), that aims to improve the performance of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server core. RCL limits the performance collapse observed with other lock algorithms when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the core acquiring the lock because such data can typically remain in the server core's cache. We have developed a profiler that identifies the locks that are the bottlenecks in multithreaded applications and that can thus benefit from RCL, and a reengineering tool that transforms POSIX locks into RCL locks. We have evaluated our approach on 18 applications: Memcached, Berkeley DB, the 9 applications of the SPLASH-2 benchmark suite and the 7 applications of the Phoenix2 benchmark suite. 10 of these applications, including Memcached and Berkeley DB, are unable to scale because of locks, and benefit from RCL. Using RCL locks, we get performance improvements of up to 2.6 times with respect to POSIX locks on Memcached, and up to 14 times with respect to Berkeley DB.