Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Efficient, transparent, and comprehensive runtime code manipulation
Efficient, transparent, and comprehensive runtime code manipulation
A comparison of software and hardware techniques for x86 virtualization
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Valgrind: a framework for heavyweight dynamic binary instrumentation
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
lmbench: portable tools for performance analysis
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Accelerating two-dimensional page walks for virtualized systems
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Binary translation using peephole superoptimizers
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Software techniques for avoiding hardware virtualization exits
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Hi-index | 0.00 |
Power Architecture® processors are popular and widespread on embedded systems, and such platforms are increasingly being used to run virtual machines. While the Power Architecture meets the Popek-and-Goldberg virtualization requirements for traditional trap-and-emulate style virtualization, the performance overhead of virtualization remains high. For example, workloads exhibiting a large amount of kernel activity typically show 3-5x slowdowns over bare-metal. Recent additions to the Linux kernel contain guest and host side paravirtual extensions for Power Architecture platforms. While these extensions improve performance significantly, they are guest-specific, guest-intrusive, and cover only a subset of all possible virtualization optimizations. We present a set of host-side optimizations that achieve comparable performance to the aforementioned paravirtual extensions, on an unmodified guest. Our optimizations are based on adaptive in-place binary translation. Unlike the paravirtual approach, our solution is guest neutral. We implement our ideas in a prototype based on Qemu/KVM. After our modifications, KVM can boot an unmodified Linux guest around 2.5x faster. We contrast our optimization approach with previous similar binary translation based approaches for the x86 architecture; in our experience, each architecture presents a unique set of challenges and optimization opportunities.