Fault-Tolerant File-I/O for Portable Checkpointing Systems

  • Authors:
  • Igor Lyubashevskiy;Volker Strumpen

  • Affiliations:
  • Oracle Corp. 781-768-5600 Facilities, 6th Floor 200 Fifth Avenue Waltham, MA 02451-8779 ilyubash@us.oracle.com;Yale University, Departments of Electrical Engineering and Computer Science strumpen@ee.yale.edu

  • Venue:
  • The Journal of Supercomputing - Special issue on embedded fault-tolerance systems
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

The ftIO-system provides portable and fault-tolerant file-I/O by enhancing the functionality of the ANSI C file system without changing its application programmer interface and without depending on system-specific implementations of the standard file operations. The ftIO-system is an extension of the porch compiler and its runtime system. The porch compiler automatically generates code to save bookkeeping information about ftIO's transactional file operations in portable checkpoints. These portable checkpoints can be recovered on a binary incompatible architecture. We developed a new algorithm for supporting transactional file operations in ftIO. Rather than using the well-known two-phase commit protocol, this algorithm uses only a single bit of information and an atomic rename file operation to guarantee fault tolerance. In this paper, we describe our new ftIO algorithm, discuss design choices for ftIO, and provide experimental data of our ftIO prototype.