Managing energy and server resources in hosting centers
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
ickp: A Consistent Checkpointer for Multicomputers
IEEE Parallel & Distributed Technology: Systems & Technology
Low-Latency, Concurrent Checkpointing for Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Post-copy based live virtual machine migration using adaptive pre-paging and dynamic self-ballooning
Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
SnowFlock: rapid virtual machine cloning for cloud computing
Proceedings of the 4th ACM European conference on Computer systems
LiteGreen: saving energy in networked desktops using virtualization
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
A New Concurrent Checkpoint Mechanism for Real-Time and Interactive Processes
COMPSAC '10 Proceedings of the 2010 IEEE 34th Annual Computer Software and Applications Conference
Fast and space-efficient virtual machine checkpointing
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Fast restore of checkpointed memory using working set estimation
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
FAST: quick application launch on solid-state drives
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Kaleidoscope: cloud micro-elasticity via VM state coloring
Proceedings of the sixth conference on Computer systems
FlurryDB: a dynamically scalable relational database with virtual machine cloning
Proceedings of the 4th Annual International Conference on Systems and Storage
CoLT: Coalesced Large-Reach TLBs
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Cloud providers are increasingly looking to use virtual machine checkpointing for new applications beyond fault tolerance. Existing checkpointing systems designed for fault tolerance only optimize for saving checkpointed state, so they cannot support these new applications, which require better restore performance. Improving restore performance requires a predictive technique to reduce the number of disk accesses to bring in the VM's memory on restore. However, complex VM workloads can diverge at any time due to external inputs, background processes, and timing variation, so predicting which pages the VM will access on restore to reduce faults to disk is impossible. Instead, we focus on predicting which pages the VM will access together on restore to improve the efficiency of disk accesses. To reduce the number of faults to disk on restore, we group memory pages likely to be accessed together into locality blocks. On each fault, we can load a block of pages that are likely to be accessed with the faulting page, eliminating future faults and increasing disk efficiency. We implement support for locality blocks, along with several other optimizations, in a new checkpointing system for VMware ESXi Server called Halite. Our experiments show that Halite reduces restore overhead by up to 94% for a range of workloads.