A protocol for reconciling recovery and high-availability in replicated databases

Authors:
J. E. Armendáriz-Iñigo;F. D. Muñoz-Escoí;H. Decker;J. R. Juárez-Rodríguez;J. R. González de Mendívil
Affiliations:
Universidad Pública de Navarra, Pamplona, Spain;Instituto Tecnológico de Informática, Valencia, Spain;Instituto Tecnológico de Informática, Valencia, Spain;Universidad Pública de Navarra, Pamplona, Spain;Universidad Pública de Navarra, Pamplona, Spain
Venue:
ISCIS'06 Proceedings of the 21st international conference on Computer and Information Sciences
Year:
2006

Citing 18
Cited 3

Concurrency control and recovery in database systems

Concurrency control and recovery in database systems
Understanding fault-tolerant distributed systems

Communications of the ACM
A critique of ANSI SQL isolation levels

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
The dangers of replication and a solution

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A new approach to developing and implementing eager database replication protocols

ACM Transactions on Database Systems (TODS)
Group communication specifications: a comprehensive study

ACM Computing Surveys (CSUR)
Online Reconfiguration in Replicated Databases Based on Group Communication

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Non-Intrusive, Parallel Recovery of Replicated Data

SRDS '02 Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems
Understanding Replication in Databases and Distributed Systems

ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
Using Optimistic Atomic Broadcast in Transaction Processing Systems

IEEE Transactions on Knowledge and Data Engineering
Replicated Database Recovery Using Multicast Communication

NCA '01 Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA'01)
Epidemic Algorithms for Replicated Databases

IEEE Transactions on Knowledge and Data Engineering
Postgres-R(SI): Combining Replica Control with Concurrency Control Based on Snapshot Isolation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Middleware based data replication providing snapshot isolation

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Database Replication Using Generalized Snapshot Isolation

SRDS '05 Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems
MIDDLE-R: Consistent database replication at the middleware level

ACM Transactions on Computer Systems (TOCS)
Design of a MidO2PL Database Replication Protocol in the MADIS Middleware Architecture

AINA '06 Proceedings of the 20th International Conference on Advanced Information Networking and Applications - Volume 02
Implementing database replication protocols based on O2PL in a middleware architecture

DBA'06 Proceedings of the 24th IASTED international conference on Database and applications

Supporting amnesia in log-based recovery protocols

EATIS '07 Proceedings of the 2007 Euro American conference on Telematics and information systems
Revisiting certification-based replicated database recovery

OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part I
Improving recovery in weak-voting data replication

APPT'07 Proceedings of the 7th international conference on Advanced parallel processing technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a recovery protocol which boosts availability, fault tolerance and performance by enabling failed network nodes to resume an active role immediately after they start recovering. The protocol is designed to work in tandem with middleware-based eager update-everywhere strategies and related group communication systems. The latter provide view synchrony, i.e., knowledge about currently reachable nodes and about the status of messages delivered by faulty and alive nodes. That enables a fast replay of missed updates which defines dynamic database recovery partition. Thus, speeding up the recovery of failed nodes which, together with the rest of the network, may seamlessly continue to process transactions even before their recovery has completed. We specify the protocol in terms of the procedures executed with every message and event of interest and outline a correctness proof.