HotSnap: a hot distributed snapshot system for virtual machine cluster

  • Authors:
  • Lei Cui;Bo Li;Yangyang Zhang;Jianxin Li

  • Affiliations:
  • Beihang University, Beijing, China;Beihang University, Beijing, China;Beihang University, Beijing, China;Beihang University, Beijing, China

  • Venue:
  • LISA'13 Proceedings of the 27th international conference on Large Installation System Administration
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The management of virtual machine cluster (VMC) is challenging owing to the reliability requirements, such as non-stop service, failure tolerance, etc. Distributed snapshot of VMC is one promising approach to support system reliability, it allows the system administrators of data centers to recover the system from failure, and resume the execution from a intermediate state rather than the initial state. However, due to the heavyweight nature of virtual machine (VM) technology, applications running in the VMC suffer from long downtime and performance degradation during snapshot. Besides, the discrepancy of snapshot completion times among VMs brings the TCP backoff problem, resulting in network interruption between two communicating VMs. This paper proposes HotSnap, a VMC snapshot approach designed to enable taking hot distributed snapshot with milliseconds system downtime and TCP backoff duration. At the core of HotSnap is transient snapshot that saves the minimum instantaneous state in a short time, and full snapshot which saves the entire VM state during normal operation. We then design the snapshot protocol to coordinate the individual VM snapshots into the global consistent state of VMC. We have implemented HotSnap on QEMU/KVM, and conduct several experiments to show the effectiveness and efficiency. Compared to the live migration based distributed snapshot technique which brings seconds of system downtime and network interruption, HotSnap only incurs tens of milliseconds.