A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Parallax: managing storage for a million machines
HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Virtual machine time travel using continuous data protection and checkpointing
ACM SIGOPS Operating Systems Review
A self-organized, fault-tolerant and scalable replication scheme for cloud storage
Proceedings of the 1st ACM symposium on Cloud computing
Stochastic Models for Fault Tolerance: Restart, Rejuvenation and Checkpointing
Stochastic Models for Fault Tolerance: Restart, Rejuvenation and Checkpointing
FTCloud: A Component Ranking Framework for Fault-Tolerant Cloud Applications
ISSRE '10 Proceedings of the 2010 IEEE 21st International Symposium on Software Reliability Engineering
Hi-index | 0.00 |
Cloud computing refers to both the applications delivered as services over the Internet and the hardware and systems software in the datacenters that provide those services. Failures of any type are common in current datacenters, partly due to the number of nodes. Fault tolerance has become a major task for computer engineers and software developers because the occurrence of faults increases the cost of using resources and to meet the user expectations, the most fundamental user expectation is, of course, that his or her application correctly finishes independent of faults in the node. This paper proposes a fault tolerant architecture to Cloud Computing that uses an adaptive Checkpoint mechanism to assure that a task running can correctly finish in spite of faults in the nodes in which it is running. The proposed fault tolerant architecture is simultaneously transparent and scalable.