A comparative study at the logical level of centralised and distributed recovery in clusters

  • Authors:
  • Andrew Maloney;Andrzej Goscinski

  • Affiliations:
  • School of Information Technology, Deakin University, Geelong, Vic, Australia;School of Information Technology, Deakin University, Geelong, Vic, Australia

  • Venue:
  • ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cluster systems are becoming more prevalent in today’s computer society and users are beginning to request that these systems be reliable. Currently, most clusters have been designed to provide high performance at the cost of providing little to no reliability. To combat this, this report looks at how a recovery facility, based on either a centralised or distributed approach could be implemented into a cluster that is supported by a checkpointing facility. This recovery facility can then recover failed user processes by using checkpoints of the processes that have been taken during failure free execution.