GRM: a reliable and fault tolerant data replication middleware for grid environment
Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Hi-index | 0.00 |
A major challenge in a dynamic Grid with thousands of machines connected toeach other is fault tolerance. The more resources and components involved, themore complicated and error-prone becomes the system. Migol is an adaptive Grid middleware,which addresses the fault tolerance of Grid applications and services by providing the capability to recover applications from checkpoint files automatically. A critical aspect for an automatic recovery is the availability of checkpoint files: If a resource becomes unavailable, it is very likely that the associated storage is also unreachable, e. g. due to a network partition. A strategy to increase the availability of checkpoints isreplication.In this paper, we present the Checkpoint Replication Service. A key feature of this service is the ability to automatically replicate and monitor checkpoints in the Grid.