DDG Task Recovery for Cluster Computing

  • Authors:
  • Giang T. Nguyen;Ladislav Hluchý;Viet D. Tran;M. Kotocova

  • Affiliations:
  • -;-;-;-

  • Venue:
  • PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a solution for the problem of transparent recovery of asynchronous distributed computation on clusters of workstations when a fault occurs on a node. If the system has fault-tolerant features, it can survive the fault and continues its computations. Performance degradation is unavoidable when hardware redundancies are not available. It is a large advantage if the long-runtime application can restart from a checkpoint instead of restarting whole computation. This paper presents the fault-tolerant feature of the DDG environment oriented to cluster systems without hardware spare.