Lock-contention-aware scheduler: A scalable and energy-efficient method for addressing scalability collapse on multicore systems

Authors:
Yan Cui;Yingxin Wang;Yu Chen;Yuanchun Shi
Affiliations:
Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China
Venue:
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Year:
2013

Citing 16
Cited 1

Algorithms for scalable synchronization on shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Queue Locks on Cache Coherent Multiprocessors

Proceedings of the 8th International Symposium on Parallel Processing
Analysis of Non-Work-Conserving Processor Partitioning Policies

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
The Design and Implementation of the FreeBSD Operating System

The Design and Implementation of the FreeBSD Operating System
Solaris Internals (2nd Edition)

Solaris Internals (2nd Edition)
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Experience distributing objects in an SMMP OS

ACM Transactions on Computer Systems (TOCS)
The kill rule for multicore

Proceedings of the 44th annual Design Automation Conference
TxLinux: using and managing hardware transactional memory in an operating system

Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Contention-aware scheduler: unlocking execution parallelism in multithreaded java programs

Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
Accelerating critical section execution with asymmetric multi-core architectures

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Factored operating systems (fos): the case for a scalable operating system for multicores

ACM SIGOPS Operating Systems Review
Corey: an operating system for many cores

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
An analysis of Linux scalability to many cores

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Experience on Comparison of Operating Systems Scalability on the Multi-core Architecture

CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
Reducing Scalability Collapse via Requester-Based Locking on Multicore Systems

MASCOTS '12 Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems

MultiLanes: providing virtualized storage for OS-level virtualization on many cores

FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

In response to the increasing ubiquity of multicore processors, there has been widespread development of multithreaded applications that strive to realize their full potential. Unfortunately, lock contention within operating systems can limit the scalability of multicore systems so severely that an increase in the number of cores can actually lead to reduced performance (i.e., scalability collapse). Existing efforts of solving scalability collapse mainly focus on making critical sections of kernel code fine-grained or designing new synchronization primitives. However, these methods have disadvantages in scalability or energy efficiency. In this article, we observe that the percentage of lock-waiting time over the total execution time for a lock intensive task has a significant correlation with the occurrence of scalability collapse. Based on this observation, a lock-contention-aware scheduler is proposed. Specifically, each task in the scheduler monitors its percentage of lock waiting time continuously. If the percentage exceeds a predefined threshold, this task is considered as lock intensive and migrated to a Special Set of Cores (i.e., SSC). In this way, the number of concurrently running lock-intensive tasks is limited to the number of cores in the SSC, and therefore, the degree of lock contention is controlled. A central challenge of using this scheme is how many cores should be allocated in the SSC to handle lock-intensive tasks. In our scheduler, the optimal number of cores is determined online by the model-driven search. The proposed scheduler is implemented in the recent Linux kernel and evaluated using micro- and macrobenchmarks on AMD and Intel 32-core systems. Experimental results suggest that our proposal is able to remove scalability collapse completely and sustains the maximal throughput of the spin-lock-based system for most applications. Furthermore, the percentage of lock-waiting time can be reduced by up to 84%. When compared with scalability collapse reduction methods such as requester-based locking scheme and sleeping-based synchronization primitives, our scheme exhibits significant advantages in scalability, power consumption, and energy efficiency.