Efficiently tolerating failures in asynchronous real-time distributed systems

  • Authors:
  • Peng Li;Binoy Ravindran

  • Affiliations:
  • Real-Time Systems Laboratory, Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, 1776 Liberty Lane, D41, Blacksburg, VA;Real-Time Systems Laboratory, Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, 1776 Liberty Lane, D41, Blacksburg, VA

  • Venue:
  • Journal of Systems Architecture: the EUROMICRO Journal
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a proactive resource allocation algorithm, called BEA, for fault-tolerant asynchronous real-time distributed systems. BEA considers an application model where trans-node application timeliness requirements are expressed using benefit functions, and anticipated workload during future time intervals are expressed using adaptation functions. Furthermore, BEA considers an adaptation model where subtasks of application tasks are replicated at run-time for tolerating failures as well as for sharing workload increases. Given such models, the objective of the algorithm is to maximize the aggregate real-time benefit and the ability to tolerate host failures during the time window of adaptation functions. Since determining the optimal solution is computationally intractable, BEA heuristically computes suboptimal resource allocations in polynomial-time. We show that BEA can achieve almost the same fault-tolerance ability as full replication, and accrue most of real-time benefit that full replication can accrue. In the meanwhile, BEA requires much fewer replicas than full replication, and hence is cost effective.