Application level fault tolerance in heterogeneous networks of workstations
Journal of Parallel and Distributed Computing
A feedback-driven proportion allocator for real-rate scheduling
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Portable Support for Transparent Thread Migration in Java
ASA/MA 2000 Proceedings of the Second International Symposium on Agent Systems and Applications and Fourth International Symposium on Mobile Agents
Towards Data-Parallel Skeletons for Grid Computing: An Itinerant Mobile Agent Approach
CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Experimental Assessment of Workstation Failures and Their Impact on Checkpointing Systems
FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
Grid performance and resource management using mobile agents
Performance analysis and grid computing
Concurrency and Computation: Practice & Experience - Middleware for Grid Computing
ARMS: An agent-based resource management system for grid computing
Scientific Programming
MAG: a mobile agent based computational grid platform
GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
Mobile software agents: an overview
IEEE Communications Magazine
Hi-index | 0.00 |
The mobile agent paradigm has emerged as a promising alternative to overcome the construction challenges of opportunistic grid environments. This model can be used to implement mechanisms that enable application execution progress even in the presence of failures, such as those presented by the MAG middleware (Mobile Agents for Grids). MAG includes retrying, replication, and checkpointing as fault-tolerance techniques; they operate independently from each other and are not capable of detecting changes on resource availability. In this paper, we describe a MAG extension that is capable of migrating agents when nodes fail, that optimizes application progress by keeping only the most advanced checkpoint, and that migrates slow replicas.