IEEE Transactions on Software Engineering
Availability of a distributed computer system with failures
Acta Informatica
Performance Characterization of Quorum-Consensus Algorithms for Replicated Data
IEEE Transactions on Software Engineering
Analytic models for the primary site approach to fault-tolerance
Acta Informatica
Determining the last process to fail
ACM Transactions on Computer Systems (TOCS)
On the Optimum Checkpoint Interval
Journal of the ACM (JACM)
Performance of rollback recovery systems under intermittent failures
Communications of the ACM
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Probability and Statistics with Reliability, Queuing and Computer Science Applications
Optimization of the number of copies in a distribution data base
PERFORMANCE '80 Proceedings of the 1980 international symposium on Computer performance modelling, measurement and evaluation
Weighted voting for replicated data
SOSP '79 Proceedings of the seventh ACM symposium on Operating systems principles
SOSP '81 Proceedings of the eighth ACM symposium on Operating systems principles
The LOCUS distributed operating system
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
A principle for resilient sharing of distributed resources
ICSE '76 Proceedings of the 2nd international conference on Software engineering
Reconfiguration Models and Algorithms for Stateful Interactive Processes
IEEE Transactions on Software Engineering
Hi-index | 14.98 |
The effect of the primary site approach for fault tolerance on the response time is studied. In the primary site approach, the service to be made fault tolerant is replicated at many nodes, one of which is designated as primary and the others as backups. All the requests for operations on the data object are sent to the primary site. The primary fails, one of the backups takes over as primary. The primary site periodically checkpoints its state on the backups. An analytical model for studying the average response time of the primary site system and analyzing the effects of the checkpointing frequency and the degree of replication on the response time is presented. This model is used to compare the response time of the system to that of a system without any fault tolerance.