Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Providing high availability using lazy replication
ACM Transactions on Computer Systems (TOCS)
Managing update conflicts in Bayou, a weakly connected replicated storage system
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Grapevine: an exercise in distributed computing
Communications of the ACM
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
SnapMirror: File-System-Based Asynchronous Mirroring for Disaster Recovery
FAST '02 Proceedings of the Conference on File and Storage Technologies
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Remus: high availability via asynchronous virtual machine replication
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Smoke and mirrors: reflecting files at a geographically remote location without loss of performance
FAST '09 Proccedings of the 7th conference on File and storage technologies
Tolerating latency in replicated state machines through client speculation
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
A self-organized, fault-tolerant and scalable replication scheme for cloud storage
Proceedings of the 1st ACM symposium on Cloud computing
Disaster recovery as a cloud service: economic benefits & deployment challenges
HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Capo: recapitulating storage for virtual desktops
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
DepSky: dependable and secure storage in a cloud-of-clouds
Proceedings of the sixth conference on Computer systems
FAST'04 Proceedings of the 3rd USENIX conference on File and storage technologies
RemusDB: transparent high availability for database systems
The VLDB Journal — The International Journal on Very Large Data Bases
SecondSite: disaster tolerance as a service
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Hybrid cloud support for large scale analytics and web processing
WebApps'12 Proceedings of the 3rd USENIX conference on Web Application Development
Yank: enabling green data centers to pull the plug
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Disaster Recovery (DR) is a desirable feature for all enterprises, and a crucial one for many. However, adoption of DR remains limited due to the stark tradeoffs it imposes. To recover an application to the point of crash, one is limited by financial considerations, substantial application overhead, or minimal geographical separation between the primary and recovery sites. In this paper, we argue for cloud-based DR and pipelined synchronous replication as an antidote to these problems. Cloud hosting promises economies of scale and on-demand provisioning that are a perfect fit for the infrequent yet urgent needs of DR. Pipelined synchrony addresses the impact of WAN replication latency on performance, by efficiently overlapping replication with application processing for multi-tier servers. By tracking the consequences of the disk modifications that are persisted to a recovery site all the way to client-directed messages, applications realize forward progress while retaining full consistency guarantees for client-visible state in the event of a disaster. PipeCloud, our prototype, is able to sustain these guarantees for multi-node servers composed of black-box VMs, with no need of application modification, resulting in a perfect fit for the arbitrary nature of VM-based cloud hosting. We demonstrate disaster failover to the Amazon EC2 platform, and show that PipeCloud can increase throughput by an order of magnitude and reduce response times by more than half compared to synchronous replication, all while providing the same zero data loss consistency guarantees.