Linearizability: a correctness condition for concurrent objects
ACM Transactions on Programming Languages and Systems (TOPLAS)
Introduction to Algorithms
Queue Locks on Cache Coherent Multiprocessors
Proceedings of the 8th International Symposium on Parallel Processing
Software transactional memory for dynamic-sized data structures
Proceedings of the twenty-second annual symposium on Principles of distributed computing
McRT-STM: a high performance software transactional memory system for a multi-core runtime
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Transactional memory and the birthday paradox
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Understanding Tradeoffs in Software Transactional Memory
Proceedings of the International Symposium on Code Generation and Optimization
SNZI: scalable NonZero indicators
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
On the correctness of transactional memory
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Toward high performance nonblocking software transactional memory
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Dynamic performance tuning of word-based software transactional memory
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
RingSTM: scalable transactions with a single atomic instruction
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Irrevocable transactions and their applications
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Scalable Techniques for Transparent Privatization in Software Transactional Memory
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
An efficient transactional memory algorithm for computing minimum spanning forest of sparse graphs
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
NZTM: nonblocking zero-indirection transactional memory
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
The Art of Multiprocessor Programming
The Art of Multiprocessor Programming
DISC'06 Proceedings of the 20th international conference on Distributed Computing
A lazy snapshot algorithm with eager validation
DISC'06 Proceedings of the 20th international conference on Distributed Computing
The multikernel: a new OS architecture for scalable multicore systems
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Brief announcement: single-version permissive STM
Proceedings of the 29th ACM SIGACT-SIGOPS symposium on Principles of distributed computing
DISC'10 Proceedings of the 24th international conference on Distributed computing
Architectural Support for Fair Reader-Writer Locking
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Single-version STMs can be multi-version permissive
ICDCN'11 Proceedings of the 12th international conference on Distributed computing and networking
Lightweight parallel accumulators using C++ templates
Proceedings of the 4th International Workshop on Multicore Software Engineering
A transactional memory with automatic performance tuning
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Optimizing software runtime systems for speculative parallelization
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
NUMA-aware reader-writer locks
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
TL2 and similar STM algorithms deliver high scalability based on write-locking and invisible readers. In fact, no modern STM design locks to read along its common execution path because doing so would require a memory synchronization operation that would greatly hamper performance. In this paper we introduce TLRW, a new STM algorithm intended for the single-chip multicore systems that are quickly taking over a large fraction of the computing landscape. We make the claim that the cost of coherence in such single chip systems is down to a level that allows one to design a scalable STM based on read-write locks. TLRW is based on byte-locks, a novel read-write lock design with a low read-lock acquisition overhead and the ability to take advantage of the locality of reference within transactions. As we show, TLRW has a painfully simple design, one that naturally provides coherent state without validation, implicit privatization, and irrevocable transactions. Providing similar properties in STMs based on invisible-readers (such as TL2) has typically resulted in a major loss of performance. In a series of benchmarks we show that when running on a 64-way single-chip multicore machine, TLRW delivers surprisingly good performance (competitive with and sometimes outperforming TL2). However, on a 128-way 2-chip system that has higher coherence costs across the interconnect, performance deteriorates rapidly. We believe our work raises the question of whether on single-chip multicore machines, read-write lock-based STMs are the way to go.