Using agent wills to provide fault-tolerance in distributed shared memory systems

  • Authors:
  • Antony Rowstron

  • Affiliations:
  • Microsoft Research Ltd., Cambridge, UK

  • Venue:
  • EURO-PDP'00 Proceedings of the 8th Euromicro conference on Parallel and distributed processing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we describe how we use mobile objects to provide distributed programs coordinating through a persistent distributed shared memory (DSM) with tolerance to sudden agent failure, and use the increasingly popular Linda-like tuple space languages as an example for implementation of the concept. In programs coordinating and communicating through a DSM a data structure is shared between multiple agents, and the agents update the shared structure directly. However, if an agent should suddenly fail it is often hard for the agents to make the data structures consistent with the new application state. For example consider if a data structure contains a list of active agents. In such a case, transactions can be used when adding and removing agent names from the list ensuring that that the data structure is consistent and does not become corrupted should an agent fail. However, if failure of the agent occurs after the name has been added, how does the application ensure the list is correct? We argue that using mobile objects we can provide wills for the agents to effectively enable them to ensure the shared data structure is application consistent, even once they have failed. We show how we have integrated the use of agent wills into a Linda system and show that we have not increased the complexity of program writing. The integration is simple and general, does not alter the underlying semantics of the operations performed in the will and the use of mobility is transparent to the programmer.