A Failure Detection System for Large Scale Distributed Systems
International Journal of Distributed Systems and Technologies
Hi-index | 0.00 |
High availability is a desired feature of a dependable distributed system. Replication is a well-known technique to achieve fault tolerance in distributed systems, thereby enhancing availability. We propose an approach relying on replication techniques and based on monitoring information to be applied in distributed systems for fault tolerance. Our approach uses both active and passive strategies to implement an optimistic replication protocol. Using a proxy to handle service calls and relying on service replication strategies, we effectively deal with the complexity and overhead issues. This paper presents an architecture for implementing the proxy based on monitoring data and the replication management. Experimentation and application testing using an implementation of the architecture is presented. The architecture is demonstrated to be a viable technique for increasing dependability in distributed systems.