Reduced Overhead Logging for Rollback Recovery in Distributed Shared Memory

  • Authors:
  • G. Suri;B. Jannsens

  • Affiliations:
  • -;-

  • Venue:
  • FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

Abstract: Rollback techniques that use message logging and deterministic replay can be used in parallel systems to recover a failed node without involving other nodes. Distributed shared memory (DSM) systems cannot directly apply message-passing logging techniques because they use inherently nondeterministic asynchronous communication. This paper presents new logging schemes that reduce the typically high overhead for logging in DSM. Our algorithm for sequentially consistent systems tracks rather than logs accesses to shared memory. In an extension of this method to lazy release consistency, the per-access overhead of tracking has been completely eliminated. Measurements with parallel applications show a significant reduction in failure-free overhead.