Geographically Distributed System for Catastrophic Recovery

Authors:
Affiliations:
Venue:
LISA '02 Proceedings of the 16th USENIX conference on System administration
Year:
2002

Citing 23
Cited 1

Computer backup pools, disaster recovery, and default risk

Communications of the ACM
A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Definition and Analysis of Hardware- and Software-Fault-Tolerant Architectures

Computer
Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Safety first

JCIT Proceedings of the fifth Jerusalem conference on Information technology
The design and implementation of a log-structured file system

ACM Transactions on Computer Systems (TOCS)
In search of clusters: the coming battle in lowly parallel computing

In search of clusters: the coming battle in lowly parallel computing
Preparing for the worst

IEEE Spectrum
Blueprints for high availability: designing resilient distributed systems

Blueprints for high availability: designing resilient distributed systems
Quality of service in IP networks: foundations for a multi-service Internet

Quality of service in IP networks: foundations for a multi-service Internet
Process migration

ACM Computing Surveys (CSUR)
Quality of Service in ATM Networks: State-of-the-Art Traffic Management

Quality of Service in ATM Networks: State-of-the-Art Traffic Management
High-Performance Web Site Design Techniques

IEEE Internet Computing
QoS-Sensitive Flows: Issues in IP Packet Handling

IEEE Internet Computing
Predicting Client/Server Availability

Computer
Using NUMA Interconnects for Highly Available Filers

IEEE Micro
A scalable and highly available web server

COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
The Design and Architecture of the Microsoft Cluster Service - A Practical Approach to High-Availability and Scalability

FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
A principle for resilient sharing of distributed resources

ICSE '76 Proceedings of the 2nd international conference on Software engineering
Geographic Load Balancing for Scalable Distributed Web Systems

MASCOTS '00 Proceedings of the 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Database Replication Techniques: A Three Parameter Classification

SRDS '00 Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems
Project Prioritization and Selection: the disaster scenario

HICSS '99 Proceedings of the Thirty-second Annual Hawaii International Conference on System Sciences-Volume 7 - Volume 7
Solaris MC: A Multi-Computer OS

Solaris MC: A Multi-Computer OS

Enhancing the Disaster Recovery Plan Through Virtualization

Journal of Information Technology Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the results of a proof-of-concept implementation of an on-going project to create a cost effective method to provide geographic distribution of critical portions of a data center along with methods to make the transition to these backup services quick and accurate. The project emphasizes data integrity over timeliness and prioritizes services to be offered at the remote site. The paper explores the tradeoff of using some common clustering techniques to distribute a backup system over a significant geographical area by relaxing the timing requirements of the cluster technologies at a cost of fidelity.The trade-off is that the fail-over node is not suitable for high availability use as some loss of data is expected and fail-over time is measured in minutes not in seconds. Asynchronous mirroring, exploitation of file commonality in file updates, IP Quality of Service and network efficiency mechanisms are enabling technologies used to provide a low bandwidth solution for the communications requirements. Exploitation of file commonality in file updates decreases the overall communications requirement. IP Quality of Service mechanisms are used to guarantee a minimum available bandwidth to ensure successful data updates. Traffic shaping in conjunction with asynchronous mirroring is used to provide an efficient use of network bandwidth.Traffic shaping allows a maximum bandwidth to be set minimizing the impact on the existing infrastructure and provides a lower requirement for a service level agreement if shared media is used. The resulting disaster recovery site, allows off-line verification of disaster recovery procedures and quick recovery times of critical data center services that is more cost effective than a transactionally aware replication of the data center and more comprehensive than a commercial data replication solution used exclusively for data vaulting. The paper concludes with a discussion of the empirical results of a proof-of-concept implementation.