A Variational Calculus Approach to Optimal Checkpoint Placement
IEEE Transactions on Computers
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A dynamic data replication strategy using access-weights in data grids
The Journal of Supercomputing
An interoperable context sensitive model of trust
Journal of Intelligent Information Systems
Future Generation Computer Systems
Future Generation Computer Systems
Future Generation Computer Systems
Communications of the ACM
Secure Data Objects Replication in Data Grid
IEEE Transactions on Dependable and Secure Computing
Low Overhead Incremental Checkpointing and Rollback Recovery Scheme on Windows Operating System
WKDD '10 Proceedings of the 2010 Third International Conference on Knowledge Discovery and Data Mining
Supporting fault-tolerance for time-critical events in distributed environments
Scientific Programming
A data placement strategy in scientific cloud workflows
Future Generation Computer Systems
Achieving efficient agreement within a dual-failure cloud-computing environment
Expert Systems with Applications: An International Journal
Performance evaluation of fault tolerance techniques in grid computing system
Computers and Electrical Engineering
FTCloud: A Component Ranking Framework for Fault-Tolerant Cloud Applications
ISSRE '10 Proceedings of the 2010 IEEE 21st International Symposium on Software Reliability Engineering
Approaches to improve the resources management in the simulator cloudsim
ICICA'10 Proceedings of the First international conference on Information computing and applications
Cloud computing - The business perspective
Decision Support Systems
A hybrid fault tolerance technique in grid computing system
The Journal of Supercomputing
FREM: A Fast Restart Mechanism for General Checkpoint/Restart
IEEE Transactions on Computers
Performance Analysis of Cloud Computing Services for Many-Tasks Scientific Computing
IEEE Transactions on Parallel and Distributed Systems
Job scheduling algorithm based on Berger model in cloud environment
Advances in Engineering Software
A survey on software checkpointing and mobility techniques in distributed systems
Concurrency and Computation: Practice & Experience
BFTCloud: A Byzantine Fault Tolerance Framework for Voluntary-Resource Cloud Computing
CLOUD '11 Proceedings of the 2011 IEEE 4th International Conference on Cloud Computing
A Fault-Tolerant High Performance Cloud Strategy for Scientific Computing
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Live Virtual Machine Migration via Asynchronous Replication and State Synchronization
IEEE Transactions on Parallel and Distributed Systems
An effective job replication technique based on reliability and performance in mobile grids
GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing
Achieving Accountable MapReduce in cloud computing
Future Generation Computer Systems
Hi-index | 0.00 |
In this paper, the definitions of fault, error, and failure in a cloud are given and the principles for high fault tolerance objectives are systematically analysed by referring to the fault tolerance theories suitable for large-scale distributed computing environments. Based on the principles and semantics of cloud fault tolerance, a dynamic adaptive fault tolerance strategy DAFT is put forward. It includes: 1 analysing the mathematical relationship between different failure rates and checkpointing fault tolerance strategy; 2 building a dynamic adaptive checkpointing fault tolerance model to maximise the serviceability and meet the SLOs; and 3 evaluating the dynamic adaptive fault tolerance strategy under various conditions in large-scale cloud data centres and consider different system centric parameters, such as fault tolerance degree, fault tolerance overhead, etc. Theoretical as well as experimental results conclusively demonstrate that the dynamic adaptive fault tolerance strategy DAFT has high potential as it provides efficient fault tolerance enhancements.