NT-SwiFT: software implemented fault tolerance on Windows NT

  • Authors:
  • Deron Liang;P. Emerald Chung;Yennun Huang;Chandra Kintala;Woei-Jyh Lee;Timothy K. Tsai;Chung-Yih Wang

  • Affiliations:
  • Department of Computer Science, National Taiwan Ocean University, Keelung 202, Taiwan;Siebel Systems, Suite 2100, 411 108th Ave NE, Bellevue, WA;AT&T Research Labs, 180 Park Avenue, P.O. Box 971, Florham Park, NJ;Network Software Research, Avaya Labs, 233 Mount Airy Road, Basking Ridge, NJ;Department of Computer Science, University of Maryland, College Park, MD;Network Software Research, Avaya Labs, 233 Mount Airy Road, Basking Ridge, NJ;AT&T Research Labs, 180 Park Avenue, P.O. Box 971, Florham Park, NJ

  • Venue:
  • Journal of Systems and Software
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Today, there are increasing demands to make application software more tolerant to failures. Fault-tolerant applications detect and recover from failures that are not handled by the application's underlying hardware or operating system, In recent years, an increasing number of highly available applications are being implemented on Windows NT. However, the current version of Windows (NT4.0, 2000) and its utilities, such as Microsoft Cluster Server (MSCS), do not provide some facilities (such as transparent checkpointing, and message logging) that are needed to implement fault-tolerant applications. In this paper, we describe a set of reusable software components collectively named software implemented fault tolerance (NT-SwiFT) that facilitates building fault-tolerant and highly available applications on Windows NT, 2000. NT-SwiFT provides components for automatic error detection and recovery, checkpointing, event logging and replay, and communication error recovery, and incremental data replication. Using NT-SwiFT , we conducted fault injection experiments on three commercial server applications--Apache web server, Microsoft IIS web server, and Microsoft SQL to study the failure coverage and the overhead of NT-SwiFT components. Preliminary results show that NT-SwiFT can detect and recover more application failures than MSCS does in all three applications.