An adaptive task-level fault-tolerant approach to Grid

  • Authors:
  • Yongwei Wu;Yulai Yuan;Guangwen Yang;Weimin Zheng

  • Affiliations:
  • Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, People's Republic of China 100084;Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, People's Republic of China 100084;Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, People's Republic of China 100084;Department of Computer Science and Technology, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, People's Republic of China 100084

  • Venue:
  • The Journal of Supercomputing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A strong failure recovery mechanism handling diverse failures in heterogeneous and dynamic Grid is so important to ensure the complete execution of long-running applications. Although there have been various efforts made to address this issue, existing solutions either focus on employing only one single fault-tolerant technique without considering the diversity of failures, or propose some frameworks which cannot deal with various kinds of failures adaptively in Grid. In this paper, an adaptive task-level, fault-tolerant approach to Grid is proposed. This approach aims at handling quite a complete set of failures arising in Grid environment by integrating basic fault-tolerant approaches. Moreover, this paper puts forward that resource consumption (not received enough attention) is also an important evaluation metric for any fault-tolerant approach. The corresponding evaluation models based on mean execution time and resource consumption are constructed to evaluate any fault-tolerant approach. Based on the models, we also demonstrate the effectiveness of our approach and illustrate the performance gains achieved via simulations. The experiments based on a real Grid have been made and the results show that our approach can achieve better performance and consume less resource.