A security management scheme for failure detector distributed systems based on self-tuning control theory

Authors:
Naixue Xiong;Jong Hyuk Park;Laurence T. Yang;Byoung-Soo Koh;Yingshu Li
Affiliations:
Department of Computer Science, Georgia State University, Atlanta, USA;Department of Computer Science and Engineering, Seoul National University of Technology, Seoul, Korea;Department of Computer Science, St. Francis Xavier University, Antigonish, Canada;DigiCAPS Co., Ltd., Seocho-Gu, Korea;Department of Computer Science, Georgia State University, Atlanta, USA
Venue:
Journal of Intelligent Manufacturing
Year:
2011

Citing 18
Cited 0

Totem: a fault-tolerant multicast group communication system

Communications of the ACM
Horus: a flexible group communication system

Communications of the ACM
In search of clusters (2nd ed.)

In search of clusters (2nd ed.)
On scalable and efficient distributed failure detectors

Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
Scalable flow control for multicast ABR services in ATM networks

IEEE/ACM Transactions on Networking (TON)
Reliable Distributed Computing with the ISIS Toolkit

Reliable Distributed Computing with the ISIS Toolkit
Failure Detectors as First Class Objects

DOA '99 Proceedings of the International Symposium on Distributed Objects and Applications
A Fault Detection Service for Wide Area Distributed Computations

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Failure Detectors for Large-Scale Distributed Systems

SRDS '02 Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems
ADAPTATION - Algorithms to ADAPTive FAulT MonItOriNg and Their Implementation on CORBA

DOA '01 Proceedings of the Third International Symposium on Distributed Objects and Applications
Impact of a Failure Detection Mechanism on the Performance of Consensus

PRDC '01 Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing
An Adaptive Failure Detection Protocol

PRDC '01 Proceedings of the 2001 Pacific Rim International Symposium on Dependable Computing
RELACS: A Communications Infrastructure for Constructing Reliable Applications in Large-Scale Distributed Systems

RELACS: A Communications Infrastructure for Constructing Reliable Applications in Large-Scale Distributed Systems
The ensemble system

The ensemble system
The Anatomy of the Grid: Enabling Scalable Virtual Organizations

International Journal of High Performance Computing Applications
Comparative Analysis of QoS and Memory Usage of Adaptive Failure Detectors

PRDC '07 Proceedings of the 13th Pacific Rim International Symposium on Dependable Computing
A gossip-style failure detection service

Middleware '98 Proceedings of the IFIP International Conference on Distributed Systems Platforms and Open Distributed Processing
Data transmission rate control in computer networks using neural predictive networks

ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information security management has become an important research issue in distributed systems, and the detection of failures is a fundamental issue for fault tolerance in large distributed systems. Recently, many people have come to realize that failure detection ought to be provided as some form of generic service, similar to IP address lookup. However, this has not been successful so far; one of the reasons being the fact that classical failure detectors were not designed to satisfy several application requirements simultaneously. More specifically, traditional implementations of failure detectors are often tuned for running over local networks and fail to address some important problems found in wide-area distributed systems with a large number of monitored components, such as Grid systems. In this paper, we study the security management scheme for failure detector distributed systems. We first identify some of the most important QoS problems raised in the context of large wide-area distributed systems. Then we present a novel failure detector scheme combined with self-tuning control theory that can help in solving or optimizing some of these problems. Furthermore, this paper discusses the design and analysis of implementing a scalable failure detection service for such large wide-area distributed systems considering dynamically adjusting the heartbeat streams, so that it satisfies the bottleneck router requirements. The basic z-transformation stability test is used to achieve the stability criterion, which ensures the bounded rate allocation without steady state oscillation. We further show how the online failure detector control algorithm can be used to design a controller, analyze the theoretical aspects of the proposed algorithm and verify its agreement with the simulations in the LAN and WAN case. Simulation results show the efficiency of our scheme in terms of high utilization of the bottleneck link, fast response and good stability of the bottleneck router buffer occupancy as well as of the controlled sending rates. In conclusion, the new security management failure detector algorithm provides a better QoS than an algorithm that is proposed by Stelling et al. (Proceedings of 7th IEEE symposium on high performance distributed computing, pp. 268---278, 1998), Foster et al. (Int J Supercomput Appl, 2001).