A locking protocol for resource coordination in distributed databases

  • Authors:
  • Daniel A. Menasce;Gerald J. Popek;Richard R. Muntz

  • Affiliations:
  • Univ. of California at Los Angeles, Los Angeles;Univ. of California at Los Angeles, Los Angeles;Univ. of California at Los Angeles, Los Angeles

  • Venue:
  • ACM Transactions on Database Systems (TODS)
  • Year:
  • 1980

Quantified Score

Hi-index 0.03

Visualization

Abstract

A locking protocol to coordinate access to a distributed database and to maintain system consistency throughout normal and abnormal conditions is presented. The proposed protocol is robust in the face of crashes of any participating site, as well as communication failures. Recovery from any number of failures during normal operation or any of the recovery stages is supported. Recovery is done in such a way that maximum forward progress is achieved by the recovery procedures. Integration of virtually any locking discipline including predicate lock methods is permitted by this protocol. The locking algorithm operates, and operates correctly, when the network is partitioned, either intentionally or by failure of communication lines. Each partition is able to continue with work local to it, and operation merges gracefully when the partitions are reconnected.A subroutine of the protocol, that assures reliable communication among sites, is shown to have better performance than two-phase commit methods. For many topologies of interest, the delay introduced by the overall protocol is not a direct function of the size of the network. The communications cost is shown to grow in a relatively slow, linear fashion with the number of sites participating in the transaction. An informal proof of the correctness of the algorithm is also presented in this paper.The algorithm has as its core a centralized locking protocol with distributed recovery procedures. A centralized controller with local appendages at each site coordinates all resource control, with requests initiated by application programs at any site. However, no site experiences undue load. Recovery is broken down into three disjoint mechanisms: for single node recovery, merge of partitions, and reconstruction of the centralized controller and tables. The disjointness of the mechanisms contributes to comprehensibility and ease of proof.The paper concludes with a proposal for an extension aimed at optimizing operation of the algorithm to adapt to highly skewed distributions of activity. The extension applies nicely to interconnected computer networks.