TLRW: return of the read-write lock

Authors:
Dave Dice;Nir Shavit
Affiliations:
Sun Labs at Oracle, Burlington, MA, USA;Sun Labs at Oracle, Burlington, MA, USA
Venue:
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Year:
2010

Citing 19
Cited 10

Linearizability: a correctness condition for concurrent objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
Introduction to Algorithms

Introduction to Algorithms
Queue Locks on Cache Coherent Multiprocessors

Proceedings of the 8th International Symposium on Parallel Processing
Software transactional memory for dynamic-sized data structures

Proceedings of the twenty-second annual symposium on Principles of distributed computing
McRT-STM: a high performance software transactional memory system for a multi-core runtime

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Transactional memory and the birthday paradox

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Understanding Tradeoffs in Software Transactional Memory

Proceedings of the International Symposium on Code Generation and Optimization
SNZI: scalable NonZero indicators

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
On the correctness of transactional memory

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Toward high performance nonblocking software transactional memory

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Dynamic performance tuning of word-based software transactional memory

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
RingSTM: scalable transactions with a single atomic instruction

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Irrevocable transactions and their applications

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Scalable Techniques for Transparent Privatization in Software Transactional Memory

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
An efficient transactional memory algorithm for computing minimum spanning forest of sparse graphs

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
NZTM: nonblocking zero-indirection transactional memory

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
The Art of Multiprocessor Programming

The Art of Multiprocessor Programming
Transactional locking II

DISC'06 Proceedings of the 20th international conference on Distributed Computing
A lazy snapshot algorithm with eager validation

DISC'06 Proceedings of the 20th international conference on Distributed Computing

The multikernel: a new OS architecture for scalable multicore systems

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Brief announcement: single-version permissive STM

Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
The cost of privatization

DISC'10 Proceedings of the 24th international conference on Distributed computing
Architectural Support for Fair Reader-Writer Locking

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Single-version STMs can be multi-version permissive

ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking
Lightweight parallel accumulators using C++ templates

Proceedings of the 4th International Workshop on Multicore Software Engineering
A transactional memory with automatic performance tuning

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Optimizing software runtime systems for speculative parallelization

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Leveraging the strengths of transactional memory while maintaining system performance for a multiplayer gaming application

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
NUMA-aware reader-writer locks

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

TL2 and similar STM algorithms deliver high scalability based on write-locking and invisible readers. In fact, no modern STM design locks to read along its common execution path because doing so would require a memory synchronization operation that would greatly hamper performance. In this paper we introduce TLRW, a new STM algorithm intended for the single-chip multicore systems that are quickly taking over a large fraction of the computing landscape. We make the claim that the cost of coherence in such single chip systems is down to a level that allows one to design a scalable STM based on read-write locks. TLRW is based on byte-locks, a novel read-write lock design with a low read-lock acquisition overhead and the ability to take advantage of the locality of reference within transactions. As we show, TLRW has a painfully simple design, one that naturally provides coherent state without validation, implicit privatization, and irrevocable transactions. Providing similar properties in STMs based on invisible-readers (such as TL2) has typically resulted in a major loss of performance. In a series of benchmarks we show that when running on a 64-way single-chip multicore machine, TLRW delivers surprisingly good performance (competitive with and sometimes outperforming TL2). However, on a 128-way 2-chip system that has higher coherence costs across the interconnect, performance deteriorates rapidly. We believe our work raises the question of whether on single-chip multicore machines, read-write lock-based STMs are the way to go.