Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
The Globus Project: A Status Report
HCW '98 Proceedings of the Seventh Heterogeneous Computing Workshop
A General Framework to Solve Agreement Problems
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
SPHINX: A Fault-Tolerant System for Scheduling in Dynamic Grid Environments
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
SPHINX: A Fault-Tolerant System for Scheduling in Dynamic Grid Environments
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Fault-tolerant grid services using primary-backup: feasibility and performance
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Hi-index | 0.00 |
The major purpose of a Grid is to federate multiple powerful resources into a single virtual entity which can be accessed transparently and efficiently by external users. As a Grid is usually an unreliable system involving heterogeneous resources located in different geographical domains, distributed and fault-tolerant resource allocation services have to be provided. In particular when a crash occurs tasks have to be reallocated quickly and automatically, in a completely transparent way from the users' point of view. This paper presents Paradis, an adaptive middleware based on a set of basic agreement services that has been integrated within an experimental Grid dedicated to genomic applications.