On scalable and efficient distributed failure detectors
Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
Trustworthy components-compositionality and prediction
Journal of Systems and Software - Special issue on: Component-based software engineering
Faults in Grids: Why are they so bad and What can be done about it?
GRID '03 Proceedings of the 4th International Workshop on Grid Computing
Brain Meets Brawn: Why Grid and Agents Need Each Other
AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1
Checkpointing-based rollback recovery for parallel applications on the InteGrade grid middleware
MGC '04 Proceedings of the 2nd workshop on Middleware for grid computing
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Recent advances in checkpoint/recovery systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Globus toolkit version 4: software for service-oriented systems
NPC'05 Proceedings of the 2005 IFIP international conference on Network and Parallel Computing
Hi-index | 0.00 |
In the grid environment, some of the nodes may be working while others may not be active. Alternatively, all the computers could be operational, but their interconnection network may fail. From the perspective of one computer, such network partitioning may appear as a failure to other computers. These types of failures may lead to a major impact on the whole application, which is executing on the Grid for many days. In this paper we will be meta-modelling the computational grid and implementing the fault tolerant mechanism using Java agents. The purpose of the work proposed in this paper is to automate the development of a computational grid and creating graphical workflows of applications using domain-specific modelling techniques. This paper is to provide a high level view for the construction of Grid applications with the flexibility in design and deployment.