A Task-Based Fault-Tolerance Mechanism to Hierarchical Master/Worker with Divisible Tasks

  • Authors:
  • Zhihui Dai;Fabien Viale;Xuebin Chi;Denis Caromel;Zhonghua Lu

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • HPCC '09 Proceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Master/Worker API of the ProActive middleware provides with an easy way to use framework for parallelizing embarrassingly parallel applications. However, the traditional Master/Worker model faces great challenges as the development of the scalability of the distributed computing. A single-layer hierarchical Master/Worker has been implemented as a solution to the scalability issues of the MW API. In the new framework, the MainMaster only communicates with some SubMasters, and each SubMaster manages a set of workers. A “Bully Election Algorithm” and an “object discovery mechanism” are implemented to solve the fault-tolerance problems of the SubMasters. An automatic load-balancing mechanism is implemented for the hierarchical Master/Worker to solve divisible tasks. Moreover, an optimization has been done to make the fault-tolerance mechanism more efficient.