Minimizing completion time of a program by checkpointing and rejuvenation
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Windows NT cluster server guidebook
Windows NT cluster server guidebook
A fault-tolerant object service on CORBA
Journal of Systems and Software
Building Secure and Reliable Network Applications
Building Secure and Reliable Network Applications
Transaction Processing: Concepts and Techniques
Transaction Processing: Concepts and Techniques
Reliability Testing of Applications on Windows NT
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Why Optimistic Message Logging Has Not Been Used in Telecommunications Systems
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
A transparent checkpoint facility on NT
WINSYM'98 Proceedings of the 2nd conference on USENIX Windows NT Symposium - Volume 2
OPENNTTM: UNIX® application portability to windows NTTM via an alternative environment subsystem
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Efficient user-level thread migration and checkpointing on windows NT clusters
WINSYM'99 Proceedings of the 3rd conference on USENIX Windows NT Symposium - Volume 3
A checkpoint/restore framework for systemC-based virtual platforms
SOC'09 Proceedings of the 11th international conference on System-on-chip
A RULE-BASED DOMAIN SPECIFIC LANGUAGE FOR FAULT MANAGEMENT
Journal of Integrated Design & Process Science
Hi-index | 0.00 |
Today, there are increasing demands to make application software more tolerant to failures. Fault-tolerant applications detect and recover from failures that are not handled by the application's underlying hardware or operating system, In recent years, an increasing number of highly available applications are being implemented on Windows NT. However, the current version of Windows (NT4.0, 2000) and its utilities, such as Microsoft Cluster Server (MSCS), do not provide some facilities (such as transparent checkpointing, and message logging) that are needed to implement fault-tolerant applications. In this paper, we describe a set of reusable software components collectively named software implemented fault tolerance (NT-SwiFT) that facilitates building fault-tolerant and highly available applications on Windows NT, 2000. NT-SwiFT provides components for automatic error detection and recovery, checkpointing, event logging and replay, and communication error recovery, and incremental data replication. Using NT-SwiFT , we conducted fault injection experiments on three commercial server applications--Apache web server, Microsoft IIS web server, and Microsoft SQL to study the failure coverage and the overhead of NT-SwiFT components. Preliminary results show that NT-SwiFT can detect and recover more application failures than MSCS does in all three applications.