Data Replication Strategies for Fault Tolerance and Availability on Commodity Clusters

Authors:
Cristiana Amza;Alan L. Cox;Willy Zwaenepoel
Affiliations:
-;-;-
Venue:
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Year:
2000

Citing 0
Cited 10

Dynamic Data Replication: An Approach to Providing Fault-Tolerant Shared Memory Clusters

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Rx: treating bugs as allergies---a safe method to survive software failures

Proceedings of the twentieth ACM symposium on Operating systems principles
Fast and transparent recovery for continuous availability of cluster-based servers

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Flashback: a lightweight extension for rollback and deterministic replay for software debugging

ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Rx: Treating bugs as allergies—a safe method to survive software failures

ACM Transactions on Computer Systems (TOCS)
LeakSurvivor: towards safely tolerating memory leaks for garbage-collected languages

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
A replication protocol with composite topology for high adaptability

ICCSA'03 Proceedings of the 2003 international conference on Computational science and its applications: PartI
Symmetric tree replication protocol for efficient distributed storage system

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Dynamic hybrid replication effectively combining tree and grid topology

The Journal of Supercomputing
Robust snapshot replication

ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent work has shown the advantages of using persistent memory for transaction processing. In particular, the Vista transaction system uses recoverable memory to avoid disk I/O, thus improving performance by several orders of magnitude. In such a system, however, the data is safe when a node fails, but unavailable until it recovers, because the data is kept in only one memory.In contrast, our work uses data replication to provide both reliability and data availability while still maintaining very high transaction throughput. We investigate four possible designs for a primary-backup system, using a cluster of commodity servers connected by a write-through capable system area network (SAN). We show that logging approaches outperform mirroring approaches, even when communicating more data, because of their better locality. Finally, we show that the best logging approach also scales well to small shared-memory multiprocessors.