Optimizing crash dump in virtualized environments

  • Authors:
  • Yijian Huang;Haibo Chen;Binyu Zang

  • Affiliations:
  • Parallel Processing Institute, Fudan University, Shanghai, China;Parallel Processing Institute, Fudan University, Shanghai, China;Parallel Processing Institute, Fudan University, Shanghai, China

  • Venue:
  • Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Crash dump, or core dump is the typical way to save memory image on system crash for future offline debugging and analysis. However, for typical server machines with likely abundant memory, the time of core dump can significantly increase the mean time to repair (MTTR) by delaying the reboot-based recovery, while not dumping the failure context for analysis would risk recurring crashes on the same problems. In this paper, we propose several optimization techniques for core dump in virtualized environments, in order to shorten the MTTR of consolidated virtual machines during crashes. First, we parallelize the process of crash dump and the process of rebooting the crashed VM, by dynamically reclaiming and allocating memory between the crashed VM and the newly spawned VM. Second, we use the virtual machine management layer to introspect the critical data structures of the crashed VM to filter out the dump of unused memory. Finally, we implement disk I/O rate control between core dump and the newly spawned VM according to user-tuned rate control policy to balance the time of crash dump and quality of services in the recovery VM. We have implemented a working prototype, Vicover, that optimizes core dump on system crash of a virtual machine in Xen, to minimize the MTTR of core dump and recovery as a whole. In our experiment on a virtualized TPC-W server, Vicover shortens the downtime caused by crash dump by around 5X.