Lock-contention-aware scheduler: A scalable and energy-efficient method for addressing scalability collapse on multicore systems

  • Authors:
  • Yan Cui;Yingxin Wang;Yu Chen;Yuanchun Shi

  • Affiliations:
  • Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China

  • Venue:
  • ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In response to the increasing ubiquity of multicore processors, there has been widespread development of multithreaded applications that strive to realize their full potential. Unfortunately, lock contention within operating systems can limit the scalability of multicore systems so severely that an increase in the number of cores can actually lead to reduced performance (i.e., scalability collapse). Existing efforts of solving scalability collapse mainly focus on making critical sections of kernel code fine-grained or designing new synchronization primitives. However, these methods have disadvantages in scalability or energy efficiency. In this article, we observe that the percentage of lock-waiting time over the total execution time for a lock intensive task has a significant correlation with the occurrence of scalability collapse. Based on this observation, a lock-contention-aware scheduler is proposed. Specifically, each task in the scheduler monitors its percentage of lock waiting time continuously. If the percentage exceeds a predefined threshold, this task is considered as lock intensive and migrated to a Special Set of Cores (i.e., SSC). In this way, the number of concurrently running lock-intensive tasks is limited to the number of cores in the SSC, and therefore, the degree of lock contention is controlled. A central challenge of using this scheme is how many cores should be allocated in the SSC to handle lock-intensive tasks. In our scheduler, the optimal number of cores is determined online by the model-driven search. The proposed scheduler is implemented in the recent Linux kernel and evaluated using micro- and macrobenchmarks on AMD and Intel 32-core systems. Experimental results suggest that our proposal is able to remove scalability collapse completely and sustains the maximal throughput of the spin-lock-based system for most applications. Furthermore, the percentage of lock-waiting time can be reduced by up to 84%. When compared with scalability collapse reduction methods such as requester-based locking scheme and sleeping-based synchronization primitives, our scheme exhibits significant advantages in scalability, power consumption, and energy efficiency.