A Versatile, Proactive Dependability Approach to Handling Unanticipated Events in Distributed Systems

  • Authors:
  • Priya Narasimhan;Raj Rajkumar;Gautam Thaker;Patrick Lardieri

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA;Lockheed Martin ATL, Cherry Hill, NJ;Lockheed Martin ATL, Cherry Hill, NJ

  • Venue:
  • IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 2 - Volume 03
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The MEAD system that we are developing employs a synergistic combination of a reactiveand a proactive fault-tolerance approach in order to address unanticipated events andhazards in real-time, fault-tolerant distributed systems. The reactive fault-tolerance approach involves active monitoring of the system to adapt the provided QoS and to allocate resources based on current conditions in the system. The proactive approach involves monitoring both the distributed applications and the network to seek pre-cursors to imminent failures, and then to trigger fault-recovery mechanisms in advance of the occurrence of the failure. The underlying ideas of the MEAD system have demonstrated initial promise through our enhanced capabilities to handle failures and unanticipated events, and to reduce jitter under faulty conditions.