SCR algorithm: saving/restoring states of file systems

  • Authors:
  • Wei Xiao-Hui;Ju Jiu-Bin

  • Affiliations:
  • Department of Computer Science, Jilin University, Changchun 130023, China;Department of Computer Science, Jilin University, Changchun 130023, China

  • Venue:
  • ACM SIGOPS Operating Systems Review
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Fault-tolerance is very important in cluster computing. Many famous cluster-computing systems have implemented fault-tolerance by using checkpoint/restart mechanism. But existent checkpointing algorithms can not restore the states of a file system when roll-backing the running of a program, so there are many restrictions on file accesses in existent fault-tolerance systems. SCR algorithm, an algorithm based on atomic operation and consistent schedule, which can restore the states of file systems, is present in this paper. In SCR algorithm, system calls on file sytems are classified into idempotent operations and non-idempotent operations. A non-idempotent operation modifies a file system's states, and an idempotent operation does not. SCR algorithm dynamically follows the tracks of a program's running, logs each non-idempotent operation used by the program and the information that can restore the operation in disks. When checkpointing roll-backing the program, SCR algorithm will revert the file system states to the last checkpoint time. By using SCR algorithm, users are allowed to use any file operation in their programs.