Fast restore of checkpointed memory using working set estimation

  • Authors:
  • Irene Zhang;Alex Garthwaite;Yury Baskakov;Kenneth C. Barr

  • Affiliations:
  • VMware, Inc., Cambridge, MA, USA;VMware, Inc., Cambridge, MA, USA;VMware, Inc., Cambridge, MA, USA;VMware, Inc., Cambridge, MA, USA

  • Venue:
  • Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In order to make save and restore features practical, saved virtual machines (VMs) must be able to quickly restore to normal operation. Unfortunately, fetching a saved memory image from persistent storage can be slow, especially as VMs grow in memory size. One possible solution for reducing this time is to lazily restore memory after the VM starts. However, accesses to unrestored memory after the VM starts can degrade performance, sometimes rendering the VM unusable for even longer. Existing performance metrics do not account for performance degradation after the VM starts, making it difficult to compare lazily restoring memory against other approaches. In this paper, we propose both a better metric for evaluating the performance of different restore techniques and a better scheme for restoring saved VMs. Existing performance metrics do not reflect what is really important to the user -- the time until the VM returns to normal operation. We introduce the time-to-responsiveness metric, which better characterizes user experience while restoring a saved VM by measuring the time until there is no longer a noticeable performance impact on the restoring VM. We propose a new lazy restore technique, called working set restore, that minimizes performance degradation after the VM starts by prefetching the working set. We also introduce a novel working set estimator based on memory tracing that we use to test working set restore, along with an estimator that uses access-bit scanning. We show that working set restore can improve the performance of restoring a saved VM by more than 89% for some workloads.