Analysis and optimization of service availability in a HA cluster with load-dependent machine availability

Authors:
Chee-Wei Ang;Chen-Khong Tham
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2007

Citing 23
Cited 1

Data networks

Data networks
Simulating computer systems: techniques and tools

Simulating computer systems: techniques and tools
A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Availability and reliability modeling for computer systems

Advances in computers
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Random early detection gateways for congestion avoidance

IEEE/ACM Transactions on Networking (TON)
Parallel randomized load balancing

STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Probability and statistics with reliability, queuing and computer science applications

Probability and statistics with reliability, queuing and computer science applications
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Scheduling and Load Balancing in Parallel and Distributed Systems

Scheduling and Load Balancing in Parallel and Distributed Systems
SPNP: Stochastic Petri Net Package

PNPM '89 The Proceedings of the Third International Workshop on Petri Nets and Performance Models
A Measurement-Based Model for Estimation of Resource Exhaustion in Operational Software Systems

ISSRE '99 Proceedings of the 10th International Symposium on Software Reliability Engineering
Reliability, Availability, Dependability and Performability: A User-centered View

Reliability, Availability, Dependability and Performability: A User-centered View
An exponential failure/load relationship: results of a multi-computer statistical study

An exponential failure/load relationship: results of a multi-computer statistical study
Evaluating Web Software Reliability Based on Workload and Failure Data Extracted from Server Logs

IEEE Transactions on Software Engineering
Workload-Aware Load Balancing for Clustered Web Servers

IEEE Transactions on Parallel and Distributed Systems
Quantifying the Performability of Cluster-Based Services

IEEE Transactions on Parallel and Distributed Systems
A large-scale study of failures in high-performance computing systems

DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Load Balancing in a Cluster-Based Web Server for Multimedia Applications

IEEE Transactions on Parallel and Distributed Systems
A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation

ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
Reinforcement learning: a survey

Journal of Artificial Intelligence Research

Intelligence infrastructure: architecture discussion: performance, availability and management

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Calculations of service availability of a High- Availability (HA) cluster are usually based on the assumption of load-independent machine availabilities. In this paper, we study the issues and show how the service availabilities can be calculated under the assumption that machine availabilities are load-dependent. we present a Markov chain analysis to derive the steady-state service availabilities of a load-dependentmachine- availability HA cluster. We show that, with loaddependent machine-availability, the attained service availability is now policy-dependent. After formulating the problem as a Markov Decision Process, we proceed to determine the optimal policy to achieve the maximum service availabilities using the method of policy iteration. Two greedy assignment algorithms are studied: least-load and FDL-based, where leastload corresponds to some load-balancing algorithms.We carry out analysis and simulations on two cases of load profiles: in the first profile, a single machine has the capacity to host all services in the HA cluster; in the second profile, a single machine does not have enough capacity to host all services. We show that the service availabilities achieved under the first load profile are the same, while the service availabilities achieved under the second load profile are different. Since the service availabilities achieved are different in the second load profile, we proceed to investigate how the distribution of service availabilities across the services can be controlled by adjusting the rewards vector.