Distributed Throughput Optimization for Large-Scale Scientific Workflows Under Fault-Tolerance Constraint

  • Authors:
  • Yi Gu;Chase Qishi Wu;Xin Liu;Dantong Yu

  • Affiliations:
  • Department of Management, Marketing, Computer Science & Info System, The University of Tennessee at Martin, Martin, USA 38237;Department of Computer Science, The University of Memphis, Memphis, USA 38152;Computational Science Center, Brookhaven National Laboratory, Upton, USA 11973;Computational Science Center, Brookhaven National Laboratory, Upton, USA 11973

  • Venue:
  • Journal of Grid Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the advent of next-generation scientific applications, the workflow approach that integrates various computing and networking technologies has provided a viable solution to managing and optimizing large-scale distributed data transfer, processing, and analysis. This paper investigates a problem of mapping distributed scientific workflows for maximum throughput in faulty networks where nodes and links are subject to probabilistic failures. We formulate this problem as a bi-objective optimization problem to maximize both throughput and reliability. By adapting and modifying a centralized fault-free workflow mapping scheme, we propose a new mapping algorithm to achieve high throughput for smooth data flow in a distributed manner while satisfying a pre-specified bound of the overall failure rate for a guaranteed level of reliability. The performance superiority of the proposed solution is illustrated by both extensive simulation-based comparisons with existing algorithms and experimental results from a real-life scientific workflow deployed in wide-area networks.