Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
PRDC '04 Proceedings of the 10th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'04)
Memory resource management in VMware ESX server
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Friendly virtual machines: leveraging a feedback-control model for application adaptation
Proceedings of the 1st ACM/USENIX international conference on Virtual execution environments
Crash Data Collection: A Windows Case Study
DSN '05 Proceedings of the 2005 International Conference on Dependable Systems and Networks
HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Microreboot — A technique for cheap recovery
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Antfarm: tracking processes in a virtual machine environment
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Windows XP kernel crash analysis
LISA '06 Proceedings of the 20th conference on Large Installation System Administration
VMM-based hidden process detection and identification using Lycosid
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Dynamic memory balancing for virtual machines
Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Automated control of multiple virtualized resources
Proceedings of the 4th ACM European conference on Computer systems
A case for secure and scalable hypervisor using safe language
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Hi-index | 0.00 |
Crash dump, or core dump is the typical way to save memory image on system crash for future offline debugging and analysis. However, for typical server machines with likely abundant memory, the time of core dump can significantly increase the mean time to repair (MTTR) by delaying the reboot-based recovery, while not dumping the failure context for analysis would risk recurring crashes on the same problems. In this paper, we propose several optimization techniques for core dump in virtualized environments, in order to shorten the MTTR of consolidated virtual machines during crashes. First, we parallelize the process of crash dump and the process of rebooting the crashed VM, by dynamically reclaiming and allocating memory between the crashed VM and the newly spawned VM. Second, we use the virtual machine management layer to introspect the critical data structures of the crashed VM to filter out the dump of unused memory. Finally, we implement disk I/O rate control between core dump and the newly spawned VM according to user-tuned rate control policy to balance the time of crash dump and quality of services in the recovery VM. We have implemented a working prototype, Vicover, that optimizes core dump on system crash of a virtual machine in Xen, to minimize the MTTR of core dump and recovery as a whole. In our experiment on a virtualized TPC-W server, Vicover shortens the downtime caused by crash dump by around 5X.