Modelling and evaluating a high serviceability fault tolerance strategy in cloud computing environments

  • Authors:
  • Dawei Sun;Guiran Chang;Changsheng Miao;Xingwei Wang

  • Affiliations:
  • School of Information Science and Engineering, Northeastern University, Shenyang 110004, China;School of Information Science and Engineering, Northeastern University, Shenyang 110004, China;School of Information Science and Engineering, Northeastern University, Shenyang 110004, China;School of Information Science and Engineering, Northeastern University, Shenyang 110004, China

  • Venue:
  • International Journal of Security and Networks
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, the definitions of fault, error, and failure in a cloud are given and the principles for high fault tolerance objectives are systematically analysed by referring to the fault tolerance theories suitable for large-scale distributed computing environments. Based on the principles and semantics of cloud fault tolerance, a dynamic adaptive fault tolerance strategy DAFT is put forward. It includes: 1 analysing the mathematical relationship between different failure rates and checkpointing fault tolerance strategy; 2 building a dynamic adaptive checkpointing fault tolerance model to maximise the serviceability and meet the SLOs; and 3 evaluating the dynamic adaptive fault tolerance strategy under various conditions in large-scale cloud data centres and consider different system centric parameters, such as fault tolerance degree, fault tolerance overhead, etc. Theoretical as well as experimental results conclusively demonstrate that the dynamic adaptive fault tolerance strategy DAFT has high potential as it provides efficient fault tolerance enhancements.