Fault tolerance in distributed systems
Fault tolerance in distributed systems
Bayanihan: building and studying web-based volunteer computing systems using Java
Future Generation Computer Systems - Special issue on metacomputing
Distributed Systems: Principles and Paradigms
Distributed Systems: Principles and Paradigms
XtremWeb: A Generic Global Computing System
CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
Grid Computing: Making the Global Infrastructure a Reality
Grid Computing: Making the Global Infrastructure a Reality
A mobile agent based workflow rescheduling approach for grids
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
An agent-based approach for dynamic adjustment of scheduled jobs in computational grids
Journal of Computer and Systems Sciences International
A New Grid Scheduler with Failure Recovery and Rescheduling Mechanisms: Discussion and Analysis
Journal of Grid Computing
Hi-index | 0.00 |
In a peer to peer grid computing environment, volunteers are exposed to failures such as crash and link failures. In addition, since volunteers can dynamically join and leave executions and they are not dedicated only to a peer to peer grid computing, the executions of volunteers are stopped or suspended more frequently than in a grid computing environment. These failures result in the delay and blocking of the executions of tasks and even partial or entire loss of the executions. In addition, these failures make it difficult for a volunteer server to schedule tasks and manage the allocated tasks as well as volunteers. Existing peer to peer grid computing systems, however, do not deal with these failures in scheduling mechanisms. Moreover, since existing scheduling mechanisms are performed only by a volunteer server in a centralized way, there is a high overhead. To solve these problems, we propose a mobile agent based adaptive scheduling mechanism (MAASM). We implemented MAASM in Korea@Home and ODDUGI mobile agent system. The MAASM reduces the overhead of volunteer server by using mobile agents in scheduling procedure in a distributed way. In addition, it tolerates the various failures(especially, volunteer autonomy failures) which frequently occur in a peer to peer grid computing environment. Consequently, MAASM guarantees reliable and continuous executions in spite of the failures, so it decreases total execution time.