COLO: COarse-grained LOck-stepping virtual machines for non-stop service

  • Authors:
  • YaoZu Dong;Wei Ye;YunHong Jiang;Ian Pratt;ShiQing Ma;Jian Li;HaiBing Guan

  • Affiliations:
  • Shanghai Jiao Tong University, China and Intel Asia-Pacific R&D Ltd., China;Shanghai Jiao Tong University, China and Intel Asia-Pacific R&D Ltd., China;Shanghai Jiao Tong University, China;Bromium Inc.;Shanghai Jiao Tong University, China and Intel Asia-Pacific R&D Ltd., China;Shanghai Jiao Tong University, China;Shanghai Jiao Tong University, China

  • Venue:
  • Proceedings of the 4th annual Symposium on Cloud Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Virtual machine (VM) replication provides a software solution of for business continuity and disaster recovery through application-agnostic hardware fault tolerance by replicating the state of primary VM (PVM) to secondary VM (SVM) on a different physical node. Unfortunately, current VM replication approaches suffer from excessive overhead, which severely limit their applicability and suitability. In this paper, we leverage the practical effect of networked server-client system that PVM and SVM are considered as in the same state only if they can generate the same response from the clients' point of view, and this is exploited to optimize performance. To this end, we propose a generic and highly efficient non-stop service solution, named as "COLO" (COarse-grained LOck-stepping virtual machine) utilizing on-demand VM replication. COLO monitors the output responses of the PVM and SVM, and rules the SVM as a valid replica of the PVM according to the output similarity between PVM and SVM. If the responses do not match, the commit of network response is withheld until PVM's state has been synchronized to SVM. Hence, we ensure that the system is always capable of failover by SVM. Although non-determinism may mean a different internal state of SVM from that of the PVM, it is equally valid and remains consistent from external observations. Unlike earlier instruction level lock-stepping deterministic execution approaches, COLO can easily support Multi-Processors (MP) involving workloads with the satisfying performance. Results show that COLO significantly outperforms existing approaches, particularly on server-client workloads such as online databases and web server applications.