Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
On the minimal synchronism needed for distributed consensus
Journal of the ACM (JACM)
Consensus in the presence of partial synchrony
Journal of the ACM (JACM)
Hypervisor-based fault tolerance
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
Theoretical Computer Science
Delayed Internet routing convergence
Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Weighted voting for replicated data
SOSP '79 Proceedings of the seventh ACM symposium on Operating systems principles
A "flight data recorder" for enabling full-system multiprocessor deterministic replay
Proceedings of the 30th annual international symposium on Computer architecture
Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Communication
Heartbeat: A Timeout-Free Failure Detector for Quiescent Reliable Communication
An Adaptive Failure Detection Protocol
PRDC '01 Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
ReVirt: enabling intrusion analysis through virtual-machine logging and replay
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Seamless live migration of virtual machines over the MAN/WAN
Future Generation Computer Systems - IGrid 2005: The global lambda integrated facility
Live migration of virtual machines
NSDI'05 Proceedings of the 2nd conference on Symposium on Networked Systems Design & Implementation - Volume 2
Live wide-area migration of virtual machines including local persistent state
Proceedings of the 3rd international conference on Virtual execution environments
Execution replay of multiprocessor virtual machines
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Remus: high availability via asynchronous virtual machine replication
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
The efficacy of live virtual machine migrations over the internet
VTDC '07 Proceedings of the 2nd international workshop on Virtualization technology in distributed computing
A live storage migration mechanism over wan and its performance evaluation
VTDC '09 Proceedings of the 3rd international workshop on Virtualization technologies in distributed computing
ODR: output-deterministic replay for multicore debugging
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
A gossip-style failure detection service
Middleware '98 Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing
Respec: efficient online multiprocessor replayvia speculation and external determinism
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Evaluation of delta compression techniques for efficient live migration of large virtual machines
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
CloudNet: dynamic pooling of cloud resources by live WAN migration of virtual machines
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
SnapMirror®: file system based asynchronous mirroring for disaster recovery
FAST'02 Proceedings of the 1st USENIX conference on File and storage technologies
PipeCloud: using causality to overcome speed-of-light delays in cloud-based disaster recovery
Proceedings of the 2nd ACM Symposium on Cloud Computing
VIOLIN: virtual internetworking on overlay infrastructure
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
RemusDB: transparent high availability for database systems
The VLDB Journal — The International Journal on Very Large Data Bases
RemusDB: transparent high availability for database systems
The VLDB Journal — The International Journal on Very Large Data Bases
AI-Ckpt: leveraging memory access patterns for adaptive asynchronous incremental checkpointing
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Tradeoffs in compressing virtual machine checkpoints
Proceedings of the 7th international workshop on Virtualization technologies in distributed computing
Pico replication: a high availability framework for middleboxes
Proceedings of the 4th annual Symposium on Cloud Computing
Hi-index | 0.00 |
This paper describes the design and implementation of SecondSite, a cloud-based service for disaster tolerance. SecondSite extends the Remus virtualization-based high availability system by allowing groups of virtual machines to be replicated across data centers over wide-area Internet links. The goal of the system is to commodify the property of availability, exposing it as a simple tick box when configuring a new virtual machine. To achieve this in the wide area, we have had to tackle the related issues of replication traffic bandwidth, reliable failure detection across geographic regions and traffic redirection over a wide-area network without compromising on transparency and consistency.