A novel checkpoint mechanism based on job progress description for computational grid

  • Authors:
  • Chunjiang Lia;Xuejun Yang;Nong Xiao

  • Affiliations:
  • School of Computer, National University of Defense Technology, Changsha, China;School of Computer, National University of Defense Technology, Changsha, China;School of Computer, National University of Defense Technology, Changsha, China

  • Venue:
  • ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we argue that application-level uncoordinated checkpointing with user-defined checkpoint data is the favorable in grid environment where heterogeneity is essentially popular. We present a novel application-level uncoordinated checkpoint protocol based on Job Progress Description (JPD) which is composed by a Job Progress Record Object and a group of Job Progress State Objects, these two kinds of objects act as checkpoint data for the job and the methods of them can be used as checkpoint APIs. By extending this protocol with sender-based message logging, it can be used by the message passing applications in computational grid. Emulation with a kind of master-worker message-passing applications shows that using this checkpointing protocol can dramatically reduce the wall-time of the application when failure occurs.