A Survey of Recoverable Distributed Shared Virtual Memory Systems

  • Authors:
  • Christine Morin;Isabelle Puaut

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Transactions on Parallel and Distributed Systems
  • Year:
  • 1997

Quantified Score

Hi-index 0.01

Visualization

Abstract

Distributed Shared Virtual Memory (DSVM) systems provide a shared memory abstraction on distributed memory architectures. Such systems ease parallel application programming because the shared-memory programming model is often more natural than the message-passing paradigm. However, the probability of failure of a DSVM increases with the number of sites. Thus, fault tolerance mechanisms must be implemented in order to allow processes to continue their execution in the event of a failure. This paper gives an overview of recoverableDSVMs (RDSVMs) that provide a checkpointing mechanism to restart parallel computations in the event of a site failure.