Data networks
Analysis and simulation of a fair queueing algorithm
SIGCOMM '89 Symposium proceedings on Communications architectures & protocols
Multiprocessor Online Scheduling of Hard-Real-Time Tasks
IEEE Transactions on Software Engineering
Determining Redundancy Levels for Fault Tolerant Real-Time Systems
IEEE Transactions on Computers - Special issue on fault-tolerant computing
The grid
Fundamentals of fault-tolerant distributed computing in asynchronous environments
ACM Computing Surveys (CSUR)
Software Fault Tolerance
Distributed Systems: Principles and Paradigms
Distributed Systems: Principles and Paradigms
Efficient Scheduling Algorithms for Real-Time Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
Fault Tolerant Computing on the Grid: What are My Options?
HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
Using Reflection for Incorporating Fault-Tolerance Techniques into Distributed Applications
Using Reflection for Incorporating Fault-Tolerance Techniques into Distributed Applications
Integrating fault-tolerance techniques in grid applications
Integrating fault-tolerance techniques in grid applications
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Hi-index | 0.00 |
In this paper we study a fault tolerant model for Grid environments based on the task replication concept. The basic idea is to produce and submit to the Grid multiple replicas of a given task, given the fact that the failure probability for each one of them is known a priori. We introduce a scheme for the calculation of the number of replicas for the case of having diverse failure probabilities of each task replica and propose an efficient resource management scheme, based on fair share technique, which handles the task replicas so as to maintain in a fair way the fault tolerance in the Grid. Our study concludes with the presentation of the simulation results which validate the proposed scheme.