Citing 17
Cited 2

Throughput concavity and response time convexity

Information Processing Letters
Quantitative system performance: computer system analysis using queueing network models

Quantitative system performance: computer system analysis using queueing network models
Optimal static load balancing in distributed computer systems

Journal of the ACM (JACM)
Optimal allocation of file servers in a local network environment

IEEE Transactions on Software Engineering
Availability of a distributed computer system with failures

Acta Informatica
Task Allocation and Precedence Relations for Distributed Real-Time Systems

IEEE Transactions on Computers
Sequoia: A Fault-Tolerant Tightly Coupled Multiprocessor for Transaction Processing

Computer
A vertex-allocation theorem for resources in queuing networks

Journal of the ACM (JACM)
Load sharing in distributed systems with failures

Acta Informatica
Optimal Selection of CPU Speed, Device Capacities, and File Assignments

Journal of the ACM (JACM)
Comparative Models of the File Assignment Problem

ACM Computing Surveys (CSUR)
Probability and Statistics with Reliability, Queuing and Computer Science Applications

Probability and Statistics with Reliability, Queuing and Computer Science Applications
A Resource Allocation Policy Using Time Thresholding

Performance '83 Proceedings of the 9th International Symposium on Computer Performance Modelling, Measurement and Evaluation
A General Model for Optimal Static Load Balancing in Star Network Configurations

Performance '84 Proceedings of the Tenth International Symposium on Computer Performance Modelling, Measurement and Evaluation
A NonStop kernel

SOSP '81 Proceedings of the eighth ACM symposium on Operating systems principles
A principle for resilient sharing of distributed resources

ICSE '76 Proceedings of the 2nd international conference on Software engineering
Resource allocation with fault tolerance

Resource allocation with fault tolerance

A Distributed Fault-Tolerant Design for Multiple-Server VOD Systems

Multimedia Tools and Applications
Playback Dispatch and Fault Recovery for a Clustered Video System with Multiple Servers

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Resource allocation for a distributed system employing the primary site approach for fault tolerance is discussed. Two kinds of systems are considered. The first consists of fault-tolerant nodes where each node has many duplicated servers. One server is the primary, which serves user requests, and the rest are backup. The second does not have fault-tolerant nodes. To tolerate node failures, each node uses other nodes as backups. When a node fails, all requests initially allocated to the node are served by one of its backups. To study the resource allocation for such systems, an approximate model for each system is developed. Using these models, efficient allocation algorithms that take into account the failure/repair rates of the system and the fault-tolerant overheads are presented. Using experimental results, it is shown that the algorithms give the optimal or suboptimal allocations. The algorithms, which incur little overhead, can improve the system performance significantly over an intuitive allocation algorithm.

Resource Allocation for Primary-Site Fault-Tolerant Systems

Quantified Score

Visualization

Abstract