Hypervisor-based fault tolerance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
A "flight data recorder" for enabling full-system multiprocessor deterministic replay
Proceedings of the 30th annual international symposium on Computer architecture
Fast transparent migration for virtual machines
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Live migration of virtual machines
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Optimizing network virtualization in Xen
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Execution replay of multiprocessor virtual machines
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Remus: high availability via asynchronous virtual machine replication
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
CUBIC: a new TCP-friendly high-speed TCP variant
ACM SIGOPS Operating Systems Review - Research and developments in the Linux kernel
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
High performance virtual machine migration with RDMA over modern interconnects
CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
MapReduce and parallel DBMSs: friends or foes?
Communications of the ACM - Amir Pnueli: Ahead of His Time
Paratus: Instantaneous Failover via Virtual Machine Replication
GCC '09 Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing
CoreDet: a compiler and runtime system for deterministic multithreaded execution
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Respec: efficient online multiprocessor replayvia speculation and external determinism
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Characterizing cloud computing hardware reliability
Proceedings of the 1st ACM symposium on Cloud computing
vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
The design of a practical system for fault-tolerant virtual machines
ACM SIGOPS Operating Systems Review
Virtualization performance: perspectives and challenges ahead
ACM SIGOPS Operating Systems Review
Exploiting Data Deduplication to Accelerate Live Virtual Machine Migration
CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
Microwiper: Efficient Memory Propagation in Live Migration of Virtual Machines
ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
RDMA Based Replication of Multiprocessor Virtual Machines over High-Performance Interconnects
CLUSTER '11 Proceedings of the 2011 IEEE International Conference on Cluster Computing
TCP Vegas: end to end congestion avoidance on a global Internet
IEEE Journal on Selected Areas in Communications
Proceedings of the 6th international workshop on Virtualization Technologies in Distributed Computing Date
Streaming as a hypervisor service
Proceedings of the 7th international workshop on Virtualization technologies in distributed computing
Hi-index | 0.00 |
Checkpoint-recovery based virtual machine (VM) replication is an attractive technique for accommodating VM installations with high-availability. It provides seamless failover for the entire software stack executed in the VM regardless the application or the underlying operating system (OS), it runs on commodity hardware, and it is inherently capable of dealing with shared memory non-determinism of symmetric multiprocessing (SMP) configurations. There have been several studies aiming at alleviating the overhead of replication, however, due to consistency requirements, network performance of the basic replication mechanism remains extremely poor., In this paper we revisit the replication protocol and extend it with speculative communication. Speculative communication silently acknowledges TCP packets of the VM, enabling the guest's TCP stack to progress with transmission without exposing the messages to the clients before the corresponding execution state is checkpointed to the backup host. Furthermore, we propose replication aware congestion control, an extension to the guest's TCP stack that aggressively fills up the VMM's replication buffer so that speculative packets can be backed up and released earlier to the clients. We observe up to an order of magnitude improvement in bulk data transfer with speculative communication, and close to native VM network performance when replication awareness is enabled in the guest OS. We provide results of micro-, as well as application-level benchmarks.