Hardware support for spin management in overcommitted virtual machines

Authors:
Philip M. Wells;Koushik Chakraborty;Gurindar S. Sohi
Affiliations:
University of Wisconsin, Madison;University of Wisconsin, Madison;University of Wisconsin, Madison
Venue:
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Year:
2006

Citing 26
Cited 11

Low-synchronization translation lookaside buffer consistency in large-scale shared-memory multiprocessors

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Scheduler activations: effective kernel support for the user-level management of parallelism

ACM Transactions on Computer Systems (TOCS)
Evaluating the performance of cache-affinity scheduling in shared-memory multiprocessors

Journal of Parallel and Distributed Computing
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Generating representative Web workloads for network and server performance evaluation

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Cellular Disco: resource management using virtual clusters on shared-memory multiprocessors

Proceedings of the seventeenth ACM symposium on Operating systems principles
Simics: A Full System Simulation Platform

Computer
The Effect of Scheduling Discipline on Spin Overhead in Shared Memory Parallel Systems

IEEE Transactions on Parallel and Distributed Systems
Using Cohort-Scheduling to Enhance Server Performance

ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Pragmatic Nonblocking Synchronization for Real-Time Systems

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Variability in Architectural Simulations of Multi-Threaded Workloads

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Transient-fault recovery for chip multiprocessors

Proceedings of the 30th annual international symposium on Computer architecture
Scalable Spin Locks for Multiprogrammed Systems

Scalable Spin Locks for Multiprogrammed Systems
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance

Proceedings of the 31st annual international symposium on Computer architecture
Heat-and-run: leveraging SMT and CMP to manage power density through the operating system

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Memory resource management in VMware ESX server

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Scale and performance in the Denali isolation kernel

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Guest Editors' Introduction: Resource Virtualization Renaissance

Computer
Intel Virtualization Technology

Computer
Temporal Streaming of Shared Memory

Proceedings of the 32nd annual international symposium on Computer Architecture
Running Quake II on a grid

IBM Systems Journal
Advanced virtualization capabilities of POWER5 systems

IBM Journal of Research and Development - POWER5 and packaging
Computation spreading: employing hardware migration to specialize CMP cores on-the-fly

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Virtual Machines: Versatile Platforms for Systems and Processes (The Morgan Kaufmann Series in Computer Architecture and Design)

Virtual Machines: Versatile Platforms for Systems and Processes (The Morgan Kaufmann Series in Computer Architecture and Design)
Towards scalable multiprocessor virtual machines

VM'04 Proceedings of the 3rd conference on Virtual Machine Research And Technology Symposium - Volume 3

Computation spreading: employing hardware migration to specialize CMP cores on-the-fly

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Adapting to intermittent faults in multicore systems

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Mixed-mode multicore reliability

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Dynamic heterogeneity and the need for multicore virtualization

ACM SIGOPS Operating Systems Review
Energy efficient speculative threads: dynamic thread allocation in Same-ISA heterogeneous multicore systems

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Dynamic processors demand dynamic operating systems

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Transparently bridging semantic gap in CPU management for virtualized environments

Journal of Parallel and Distributed Computing
Toward scalable Web systems on multicore clusters: making use of virtual machines

The Journal of Supercomputing
Scheduling overcommitted VM: Behavior monitoring and dynamic switching-frequency scaling

Future Generation Computer Systems
Preemptable ticket spinlocks: improving consolidated performance in the cloud

Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Performance implications of non-uniform VCPU-PCPU mapping in virtualization environment

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multiprocessor operating systems (OSs) pose several unique and conflicting challenges to System Virtual Machines (System VMs). For example, most existing system VMs resort to gang scheduling a guest OS's virtual processors (VCPUs) to avoid OS synchronization overhead. However, gang scheduling is infeasible for some application domains, and is inflexible in other domains.In an overcommitted environment, an individual guest OS has more VCPUs than available physical processors (PCPUs), precluding the use of gang scheduling. In such an environment, we demonstrate a more than two-fold increase in runtime when transparently virtualizing a chip-multiprocessor's cores. To combat this problem, we propose a hardware technique to detect several cases when a VCPU is not performing useful work, and suggest preempting that VCPU to run a different, more productive VCPU. Our technique can dramatically reduce cycles wasted on OS synchronization, without requiring any semantic information from the software.We then present a case study, typical of server consolidation, to demonstrate the potential of more flexible scheduling policies enabled by our technique. We propose one such policy that logically partitions the CMP cores between guest VMs. This policy increases throughput by 10-25% for consolidated server workloads due to improved cache locality and core utilization, and substantially improves performance isolation in private caches.