Optimizing crash dump in virtualized environments

Authors:
Yijian Huang;Haibo Chen;Binyu Zang
Affiliations:
Parallel Processing Institute, Fudan University, Shanghai, China;Parallel Processing Institute, Fudan University, Shanghai, China;Parallel Processing Institute, Fudan University, Shanghai, China
Venue:
Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Year:
2010

Citing 14
Cited 1

Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,

Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
The System Recovery Benchmark

PRDC '04 Proceedings of the 10th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'04)
Memory resource management in VMware ESX server

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Friendly virtual machines: leveraging a feedback-control model for application adaptation

Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
Crash Data Collection: A Windows Case Study

DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
Crash-only software

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Microreboot — A technique for cheap recovery

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Antfarm: tracking processes in a virtual machine environment

ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Windows XP kernel crash analysis

LISA '06 Proceedings of the 20th conference on Large Installation System Administration
VMM-based hidden process detection and identification using Lycosid

Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Overshadow: a virtualization-based approach to retrofitting protection in commodity operating systems

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Dynamic memory balancing for virtual machines

Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Automated control of multiple virtualized resources

Proceedings of the 4th ACM European conference on Computer systems

A case for secure and scalable hypervisor using safe language

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores

Quantified Score

Hi-index	0.00

Visualization

Abstract

Crash dump, or core dump is the typical way to save memory image on system crash for future offline debugging and analysis. However, for typical server machines with likely abundant memory, the time of core dump can significantly increase the mean time to repair (MTTR) by delaying the reboot-based recovery, while not dumping the failure context for analysis would risk recurring crashes on the same problems. In this paper, we propose several optimization techniques for core dump in virtualized environments, in order to shorten the MTTR of consolidated virtual machines during crashes. First, we parallelize the process of crash dump and the process of rebooting the crashed VM, by dynamically reclaiming and allocating memory between the crashed VM and the newly spawned VM. Second, we use the virtual machine management layer to introspect the critical data structures of the crashed VM to filter out the dump of unused memory. Finally, we implement disk I/O rate control between core dump and the newly spawned VM according to user-tuned rate control policy to balance the time of crash dump and quality of services in the recovery VM. We have implemented a working prototype, Vicover, that optimizes core dump on system crash of a virtual machine in Xen, to minimize the MTTR of core dump and recovery as a whole. In our experiment on a virtualized TPC-W server, Vicover shortens the downtime caused by crash dump by around 5X.