Scalable address spaces using RCU balanced trees

Authors:
Austin T. Clements;M. Frans Kaashoek;Nickolai Zeldovich
Affiliations:
MIT, Cambridge, MA, USA;MIT, Cambridge, MA, USA;MIT, Cambridge, MA, USA
Venue:
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Year:
2012

Citing 11
Cited 13

Concurrent maintenance of skip lists

Concurrent maintenance of skip lists
Efficient locking for concurrent operations on B-trees

ACM Transactions on Database Systems (TODS)
Binary search trees of bounded balance

STOC '72 Proceedings of the fourth annual ACM symposium on Theory of computing
Exploiting deferred destruction: an analysis of read-copy-update techniques in operating system kernels

Exploiting deferred destruction: an analysis of read-copy-update techniques in operating system kernels
Scalable locality-conscious multithreaded memory allocation

Proceedings of the 5th international symposium on Memory management
Concurrent programming without locks

ACM Transactions on Computer Systems (TOCS)
OverCite: a distributed, cooperative citeseer

NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Introducing technology into the Linux kernel: a case study

ACM SIGOPS Operating Systems Review - Research and developments in the Linux kernel
Corey: an operating system for many cores

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
An analysis of Linux scalability to many cores

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Benchmarking modern multiprocessors

Benchmarking modern multiprocessors

Beyond expert-only parallel programming?

Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability
Efficient fetch-and-increment

DISC'12 Proceedings of the 26th international conference on Distributed Computing
Parallelizing live migration of virtual machines

Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
A scalable lock manager for multicores

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
RadixVM: scalable address spaces for multithreaded applications

Proceedings of the 8th ACM European Conference on Computer Systems
Application level ballooning for efficient server consolidation

Proceedings of the 8th ACM European Conference on Computer Systems
Proposing a new task model towards many-core architecture

Proceedings of the First International Workshop on Many-core Embedded Systems
Introducing kernel-level page reuse for high performance computing

Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Schedule processes, not VCPUs

Proceedings of the 4th Asia-Pacific Workshop on Systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
The scalable commutativity rule: designing scalable software for multicore processors

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
A lightweight infrastructure for graph analytics

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Large-reach memory management unit caches

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software developers commonly exploit multicore processors by building multithreaded software in which all threads of an application share a single address space. This shared address space has a cost: kernel virtual memory operations such as handling soft page faults, growing the address space, mapping files, etc. can limit the scalability of these applications. In widely-used operating systems, all of these operations are synchronized by a single per-process lock. This paper contributes a new design for increasing the concurrency of kernel operations on a shared address space by exploiting read-copy-update (RCU) so that soft page faults can both run in parallel with operations that mutate the same address space and avoid contending with other page faults on shared cache lines. To enable such parallelism, this paper also introduces an RCU-based binary balanced tree for storing memory mappings. An experimental evaluation using three multithreaded applications shows performance improvements on 80 cores ranging from 1.7x to 3.4x for an implementation of this design in the Linux 2.6.37 kernel. The RCU-based binary tree enables soft page faults to run at a constant cost with an increasing number of cores,suggesting that the design will scale well beyond 80 cores.