Optimizing virtual machine scheduling in NUMA multicore systems

Authors:
Jia Rao;Kun Wang;Xiaobo Zhou;Cheng-Zhong Xu
Affiliations:
Dept. of Computer Science, University of Colorado, Colorado Springs, USA;Dept. of Electrical and Computer Engineering, Wayne State University, USA;Dept. of Computer Science, University of Colorado, Colorado Springs, USA;Dept. of Electrical and Computer Engineering, Wayne State University, USA
Venue:
HPCA '13 Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
Year:
2013

Citing 0
Cited 1

Towards fair and efficient SMP virtual machine scheduling

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

An increasing number of new multicore systems use the Non-Uniform Memory Access architecture due to its scalable memory performance. However, the complex interplay among data locality, contention on shared on-chip memory resources, and cross-node data sharing overhead, makes the delivery of an optimal and predictable program performance difficult. Virtualization further complicates the scheduling problem. Due to abstract and inaccurate mappings from virtual hardware to machine hardware, program and system-level optimizations are often not effective within virtual machines. We find that the penalty to access the “uncore” memory subsystem is an effective metric to predict program performance in NUMA multicore systems. Based on this metric, we add NUMA awareness to the virtual machine scheduling. We propose a Bias Random vCPU Migration (BRM) algorithm that dynamically migrates vCPUs to minimize the system-wide uncore penalty. We have implemented the scheme in the Xen virtual machine monitor. Experiment results on a two-way Intel NUMA multicore system with various workloads show that BRM is able to improve application performance by up to 31.7% compared with the default Xen credit scheduler. Moreover, BRM achieves predictable performance with, on average, no more than 2% runtime variations.