Checkpointing for workflow recovery

  • Authors:
  • ZongWei Luo

  • Affiliations:
  • The University of Georgia, Athens, GA

  • Venue:
  • ACM-SE 38 Proceedings of the 38th annual on Southeast regional conference
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Workflow technology targets supporting reliable and scaleable execution, for workflow management systems (WfMS) to support large-scale multi-system applications, involving both humans and legacy systems, in distributed and often heterogeneous environments. In case of failures, workflow processes usually need to resume their executions from one of their saved states, called a checkpoint, achieved by saving the states from time to time persistently. The activity of restoring a checkpoint and resuming the execution from the checkpoint is called rollback. Those techniques have long been used in database systems. A checkpoint is an action consistent checkpoint if it represents a state between complete update operations. A consistent state in the database domain is a state when no update transactions were active. This checkpoint representing a consistent state is a transaction consistent checkpoint. A checkpoint does not need to satisfy any consistency constraints. But recovery after failure must always guarantee that the resultant state is transaction consistent even though any checkpoint used may not be. A checkpoint can be either local or global. A local checkpoint is a checkpoint taken locally, with or without cooperation with any other local checkpointing activities at different sites. A local checkpoint can be a fuzzy or consistent checkpoint. During global reconstruction, a set of local checkpoints, usually taken at different site, will be used to find global consistent state. To facilitate the global reconstruction, a global checkpoint, derived from a set of local checkpoints taken at different site, provides a rollback boundary, thus reducing the recovery time