Optimistic Recovery in Multi-Threaded Distributed Systems

  • Authors:
  • Om P. Damani;Ashis Tarafdar;Vijay K. Garg

  • Affiliations:
  • -;-;-

  • Venue:
  • SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of recovering distributed systems from crash failures has been widely studied in the context of traditional non-threaded processes. However, extending those solutions to the multi-threaded scenario presents new problems. We identify and address these problems for optimistic logging protocols.There are two natural extension to optimistic logging protocols in the multi-threaded scenario. The first extension is "process-centric", where the points of internal non-determinism caused by threads are logged. The second extension is "thread-centric", where each thread is treated as a separate process. The process-centric approach suffers from false causality while the thread-centric approach suffers from high causality tracking overhead. By observing that the granularity of failures can be different from the granularity of rollbacks, we design a new "balanced" approach which incurs low causality tracking overhead and also eliminates false causality.