The failure and recovery problem for replicated databases

  • Authors:
  • Philip A. Bernstein;Nathan Goodman

  • Affiliations:
  • -;-

  • Venue:
  • PODC '83 Proceedings of the second annual ACM symposium on Principles of distributed computing
  • Year:
  • 1983

Quantified Score

Hi-index 0.01

Visualization

Abstract

A replicated database is a distributed database in which some data items are stored redundantly at multiple sites. The main goal is to improve system reliability. By storing critical data at multiple sites, the system can operate even though some sites have failed. However, few distributed database systems support replicated data, because it is difficult to manage as sites fail and recover. A replicated data algorithm has two parts. One is a discipline for reading and writing data item copies. The other is a concurrency control algorithm for synchronizing those operations. The read-write discipline ensures that if one transaction writes logical data item ×, and another transaction reads or writes x, there is some physical manifestation of that logical conflict. The concurrency control algorithm synchronizes physical conflicts; it knows nothing about logical conflicts. In a correct replicated data algorithm, the physical manifestation of conflicts must be strong enough so that synchronizing physical conflicts is sufficient for correctness. This paper presents a theory for proving the correctness of algorithms that manage replicated data. The theory is an extension of serializability theory. We apply it to three replicated data algorithms: Gifford's “quorum consensus” algorithm, Eager and Sevcik's “missing writes” algorithm, and Computer Corporation of America's “available copies” algorithm.