Data networks
Simulating computer systems: techniques and tools
Simulating computer systems: techniques and tools
A case for redundant arrays of inexpensive disks (RAID)
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Availability and reliability modeling for computer systems
Advances in computers
Numerical recipes in C (2nd ed.): the art of scientific computing
Numerical recipes in C (2nd ed.): the art of scientific computing
Random early detection gateways for congestion avoidance
IEEE/ACM Transactions on Networking (TON)
Parallel randomized load balancing
STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Probability and statistics with reliability, queuing and computer science applications
Probability and statistics with reliability, queuing and computer science applications
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Scheduling and Load Balancing in Parallel and Distributed Systems
Scheduling and Load Balancing in Parallel and Distributed Systems
SPNP: Stochastic Petri Net Package
PNPM '89 The Proceedings of the Third International Workshop on Petri Nets and Performance Models
A Measurement-Based Model for Estimation of Resource Exhaustion in Operational Software Systems
ISSRE '99 Proceedings of the 10th International Symposium on Software Reliability Engineering
Reliability, Availability, Dependability and Performability: A User-centered View
Reliability, Availability, Dependability and Performability: A User-centered View
An exponential failure/load relationship: results of a multi-computer statistical study
An exponential failure/load relationship: results of a multi-computer statistical study
Evaluating Web Software Reliability Based on Workload and Failure Data Extracted from Server Logs
IEEE Transactions on Software Engineering
Workload-Aware Load Balancing for Clustered Web Servers
IEEE Transactions on Parallel and Distributed Systems
Quantifying the Performability of Cluster-Based Services
IEEE Transactions on Parallel and Distributed Systems
A large-scale study of failures in high-performance computing systems
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Load Balancing in a Cluster-Based Web Server for Multimedia Applications
IEEE Transactions on Parallel and Distributed Systems
A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation
ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Intelligence infrastructure: architecture discussion: performance, availability and management
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
Hi-index | 0.00 |
Calculations of service availability of a High- Availability (HA) cluster are usually based on the assumption of load-independent machine availabilities. In this paper, we study the issues and show how the service availabilities can be calculated under the assumption that machine availabilities are load-dependent. we present a Markov chain analysis to derive the steady-state service availabilities of a load-dependentmachine- availability HA cluster. We show that, with loaddependent machine-availability, the attained service availability is now policy-dependent. After formulating the problem as a Markov Decision Process, we proceed to determine the optimal policy to achieve the maximum service availabilities using the method of policy iteration. Two greedy assignment algorithms are studied: least-load and FDL-based, where leastload corresponds to some load-balancing algorithms.We carry out analysis and simulations on two cases of load profiles: in the first profile, a single machine has the capacity to host all services in the HA cluster; in the second profile, a single machine does not have enough capacity to host all services. We show that the service availabilities achieved under the first load profile are the same, while the service availabilities achieved under the second load profile are different. Since the service availabilities achieved are different in the second load profile, we proceed to investigate how the distribution of service availabilities across the services can be controlled by adjusting the rewards vector.