FTWeb: A Fault Tolerant Infrastructure for Web Services
EDOC '05 Proceedings of the Ninth IEEE International EDOC Enterprise Computing Conference
Journal of Parallel and Distributed Computing
Automatic replication of WSRF-based Grid services via operation providers
Future Generation Computer Systems
Supporting fault-tolerance for time-critical events in distributed environments
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
The reliability analysis of resiliency framework for Grid Services
ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
Supporting fault-tolerance for time-critical events in distributed environments
Scientific Programming
Architecture-based fault tolerance support for grid applications
Proceedings of the joint ACM SIGSOFT conference -- QoSA and ACM SIGSOFT symposium -- ISARCS on Quality of software architectures -- QoSA and architecting critical systems -- ISARCS
Using agreement services in grid computing
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Fault-Tolerant scheduling for bag-of-tasks grid applications
EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
Framework for enabling highly available distributed applications for utility computing
ISPA'06 Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications
Quality-of-service-aware fault tolerance for grid-enabled applications
Optical Switching and Networking
Time-constrained high-fidelity rendering on local desktop grids
EG PGV'09 Proceedings of the 9th Eurographics conference on Parallel Graphics and Visualization
A survey on reliability in distributed systems
Journal of Computer and System Sciences
Hi-index | 0.00 |
The combination of grid technology and Web services has produced an attractive platform for deploying distributed applications: grid services, as represented by the Open Grid Services Infrastructure (OGSI) and its Globus toolkit implementation. As the use of grid services grows in popularity, tolerating failures becomes increasingly important. This work addresses the problem of building a reliable and highly-available grid service by replicating the service on two or more hosts using the primary-backup approach. The primary goal is to evaluate the ease and efficiency with which this can be done, by first designing a primary-backup protocol using OGSI, and then implementing it using Globus to evaluate performance implications and tradeoffs. We compared three implementations: one that makes heavy use of the notification interface defined in OGSI, one that uses standard grid service requests instead of notification, and one that uses low-level socket primitives. The overall conclusion is that, while the performance penalty of using Globus primitives - especially notification - for replica coordination can be significant, the OGSI model is suitable for building highly-available services and it makes the task of engineering such services easier.