Reordering constraints for pthread-style locks
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance of memory reclamation for lockless synchronization
Journal of Parallel and Distributed Computing
Why the grass may not be greener on the other side: a comparison of locking vs. transactional memory
Proceedings of the 4th workshop on Programming languages and operating systems
Introducing technology into the Linux kernel: a case study
ACM SIGOPS Operating Systems Review - Research and developments in the Linux kernel
Garbage collection in the next C++ standard
Proceedings of the 2009 international symposium on Memory management
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
NOrec: streamlining STM by abolishing ownership records
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Why the grass may not be greener on the other side: a comparison of locking vs. transactional memory
ACM SIGOPS Operating Systems Review
Scalable concurrent hash tables via relativistic programming
ACM SIGOPS Operating Systems Review
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Synchronization for fast and reentrant operating system kernel tracing
Software—Practice & Experience - Focus on Selected PhD Literature Reviews in the Practical Aspects of Software Technology
Making lockless synchronization fast: performance implications of memory reclamation
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A relativistic enhancement to software transactional memory
HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism
Resizable, scalable, concurrent hash tables via relativistic programming
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
Scalable address spaces using RCU balanced trees
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Can seqlocks get along with programming language memory models?
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Position paper: nondeterminism is unavoidable, but data races are pure evil
Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
A case for relativistic programming
Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
Lockless multi-core high-throughput buffering scheme for kernel tracing
ACM SIGOPS Operating Systems Review
Verifying concurrent memory reclamation algorithms with grace
ESOP'13 Proceedings of the 22nd European conference on Programming Languages and Systems
Nonblocking algorithms and scalable multicore programming
Communications of the ACM
Nonblocking Algorithms and Scalable Multicore Programming
Queue - Concurrency
Proceedings of the 11th ACM international symposium on Mobility management and wireless access
Towards a scalable microkernel personality for multicore processors
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.02 |
The Moore's-Law-driven performance of simple instructions has improved by orders of magnitude over the past two decades, but shared-memory multiprocessor (SMMP) synchronization operations have not kept pace. SMMP software uses synchronization operations heavily, thus suffering degraded performance and scalability. As a result, many traditional SMMP algorithms are now obsolete. This dissertation presents read-copy update (RCU), a reader-writer synchronization mechanism in which read-side critical sections incur virtually zero synchronization overhead, thereby achieving near-ideal performance for read-mostly workloads. Write-side critical sections incur substantial synchronization overhead, deferring destruction and maintaining multiple versions of data structures in order to accommodate the synchronization-free read-side critical sections. In addition, writers use some mechanism, such as locking, to ensure orderly updates. Readers provide a signal enabling writers to determine when it is safe to complete destructive operations, but this signal may be deferred, permitting a single signal operation to serve multiple read-side RCU critical sections. These read-side signals are observed by a specialized garbage collector, which carries out destructive operations once it is safe to do so. Garbage collectors are typically implemented in a manner similar to a barrier computation. Production-quality garbage collectors batch destructive operations, amortizing signal-observation overhead over many updates. Although RCU is not itself new, its use has been quite specialized. This dissertation rectifies this situation by showing how RCU can be implemented efficiently in operating system kernels, by demonstrating its system-level performance and complexity benefits, and by providing a set of design patterns that make RCU more generally applicable. This dissertation compares RCU to traditional synchronization mechanisms, including locking and non-blocking synchronization, using both analytic and empirical methods. The empirical methods include both informal micro-benchmarks and formal system-level benchmarks. These benchmarks show performance benefits ranging from tens of percent to an order of magnitude and little or no increase in code complexity. Finally, this dissertation demonstrates that RCU has practical value by (1) outlining its use in several production systems, two of which have seen extensive datacenter use, one of which this author designed and implemented, and (2) documenting its widespread use in the Linux 2.6 kernel.