Concurrent maintenance of skip lists
Concurrent maintenance of skip lists
Efficient locking for concurrent operations on B-trees
ACM Transactions on Database Systems (TODS)
Binary search trees of bounded balance
STOC '72 Proceedings of the fourth annual ACM symposium on Theory of computing
Exploiting deferred destruction: an analysis of read-copy-update techniques in operating system kernels
Scalable locality-conscious multithreaded memory allocation
Proceedings of the 5th international symposium on Memory management
Concurrent programming without locks
ACM Transactions on Computer Systems (TOCS)
OverCite: a distributed, cooperative citeseer
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Introducing technology into the Linux kernel: a case study
ACM SIGOPS Operating Systems Review - Research and developments in the Linux kernel
Corey: an operating system for many cores
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
An analysis of Linux scalability to many cores
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Benchmarking modern multiprocessors
Benchmarking modern multiprocessors
Beyond expert-only parallel programming?
Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
DISC'12 Proceedings of the 26th international conference on Distributed Computing
Parallelizing live migration of virtual machines
Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
A scalable lock manager for multicores
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
RadixVM: scalable address spaces for multithreaded applications
Proceedings of the 8th ACM European Conference on Computer Systems
Application level ballooning for efficient server consolidation
Proceedings of the 8th ACM European Conference on Computer Systems
Proposing a new task model towards many-core architecture
Proceedings of the First International Workshop on Many-core Embedded Systems
Introducing kernel-level page reuse for high performance computing
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Proceedings of the 4th Asia-Pacific Workshop on Systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
The scalable commutativity rule: designing scalable software for multicore processors
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
A lightweight infrastructure for graph analytics
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Large-reach memory management unit caches
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Software developers commonly exploit multicore processors by building multithreaded software in which all threads of an application share a single address space. This shared address space has a cost: kernel virtual memory operations such as handling soft page faults, growing the address space, mapping files, etc. can limit the scalability of these applications. In widely-used operating systems, all of these operations are synchronized by a single per-process lock. This paper contributes a new design for increasing the concurrency of kernel operations on a shared address space by exploiting read-copy-update (RCU) so that soft page faults can both run in parallel with operations that mutate the same address space and avoid contending with other page faults on shared cache lines. To enable such parallelism, this paper also introduces an RCU-based binary balanced tree for storing memory mappings. An experimental evaluation using three multithreaded applications shows performance improvements on 80 cores ranging from 1.7x to 3.4x for an implementation of this design in the Linux 2.6.37 kernel. The RCU-based binary tree enables soft page faults to run at a constant cost with an increasing number of cores,suggesting that the design will scale well beyond 80 cores.