A fault tolerance approach for distributed systems using monitoring based replication

  • Authors:
  • Alexandru Costan;Ciprian Dobre;Florin Pop;Catalin Leordeanu;Valentin Cristea

  • Affiliations:
  • University Politehnica of Bucharest, Computer Science Department, 313, Splaiul Independentei, 060042, Romania;University Politehnica of Bucharest, Computer Science Department, 313, Splaiul Independentei, 060042, Romania;University Politehnica of Bucharest, Computer Science Department, 313, Splaiul Independentei, 060042, Romania;University Politehnica of Bucharest, Computer Science Department, 313, Splaiul Independentei, 060042, Romania;University Politehnica of Bucharest, Computer Science Department, 313, Splaiul Independentei, 060042, Romania

  • Venue:
  • ICCP '10 Proceedings of the Proceedings of the 2010 IEEE 6th International Conference on Intelligent Computer Communication and Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

High availability is a desired feature of a dependable distributed system. Replication is a well-known technique to achieve fault tolerance in distributed systems, thereby enhancing availability. We propose an approach relying on replication techniques and based on monitoring information to be applied in distributed systems for fault tolerance. Our approach uses both active and passive strategies to implement an optimistic replication protocol. Using a proxy to handle service calls and relying on service replication strategies, we effectively deal with the complexity and overhead issues. This paper presents an architecture for implementing the proxy based on monitoring data and the replication management. Experimentation and application testing using an implementation of the architecture is presented. The architecture is demonstrated to be a viable technique for increasing dependability in distributed systems.