The NAS parallel benchmarks—summary and preliminary results
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
IEEE/ACM Transactions on Networking (TON)
Effective distributed scheduling of parallel workloads
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems
ACM Transactions on Computer Systems (TOCS)
The Power of Two Choices in Randomized Load Balancing
IEEE Transactions on Parallel and Distributed Systems
Implications of I/O for Gang Scheduled Workloads
IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Dynamic Coscheduling on Workstation Clusters
IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Towards scalable multiprocessor virtual machines
VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3
Scheduling I/O in virtual machine monitors
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Efficient and scalable multiprocessor fair scheduling using distributed weighted round-robin
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Q-clouds: managing performance interference effects for QoS-aware clouds
Proceedings of the 5th European conference on Computer systems
Network I/O fairness in virtual machines
Proceedings of the second ACM SIGCOMM workshop on Virtualized infrastructure systems and architectures
Perfctr-Xen: a framework for performance counter virtualization
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Is co-scheduling too expensive for SMP VMs?
Proceedings of the sixth conference on Computer systems
Dynamic adaptive scheduling for virtual machines
Proceedings of the 20th international symposium on High performance distributed computing
Supporting Overcommitted Virtual Machines through Hardware Spin Detection
IEEE Transactions on Parallel and Distributed Systems
vSlicer: latency-aware virtual machine scheduling via differentiated-frequency CPU slicing
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
NINEPIN: Non-invasive and energy efficient performance isolation in virtualized servers
DSN '12 Proceedings of the 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)
Performance isolation and fairness for multi-tenant cloud storage
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
Demand-based coordinated scheduling for SMP VMs
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Optimizing virtual machine scheduling in NUMA multicore systems
HPCA '13 Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
Proceedings of the 4th Asia-Pacific Workshop on Systems
vTurbo: accelerating virtual machine I/O processing using designated turbo-sliced core
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Hi-index | 0.00 |
As multicore processors become prevalent in modern computer systems, there is a growing need for increasing hardware utilization and exploiting the parallelism of such platforms. With virtualization technology, hardware utilization is improved by encapsulating independent workloads into virtual machines (VMs) and consolidating them onto the same machine. SMP virtual machines have been widely adopted to exploit parallelism. For virtualized systems, such as a public cloud, fairness between tenants and the efficiency of running their applications are keys to success. However, we find that existing virtualization platforms fail to enforce fairness between VMs with different number of virtual CPUs (vCPU) that run on multiple CPUs. We attribute the unfairness to the use of per-CPU schedulers and the load imbalance on these CPUs that incur inaccurate CPU allocations. Unfortunately, existing approaches to reduce unfairness, e.g., dynamic load balancing and CPU capping, introduce significant inefficiencies to parallel workloads. In this paper, we present Flex, a vCPU scheduling scheme that enforces fairness at VM-level and improves the efficiency of hosted parallel applications. Flex centers on two key designs: (1) dynamically adjusting vCPU weights (FlexW) on multiple CPUs to achieve VM-level fairness and (2) flexibly scheduling vCPUs (FlexS) to minimize wasted busy-waiting time. We have implemented Flex in Xen and performed comprehensive evaluations with various parallel workloads. Results show that Flex is able to achieve CPU allocations with on average no more than 5% error compared to the ideal fair allocation. Further, Flex outperforms Xen's credit scheduler and two representative co-scheduling approaches by as much as 10X for parallel applications using busy-waiting or blocking synchronization methods.