Algorithms for scalable synchronization on shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Efficient Java RMI for parallel programming
ACM Transactions on Programming Languages and Systems (TOPLAS)
Java Virtual Machine Specification
Java Virtual Machine Specification
Java Language Specification, Second Edition: The Java Series
Java Language Specification, Second Edition: The Java Series
Distributed Systems: Principles and Paradigms
Distributed Systems: Principles and Paradigms
Ibis: an efficient Java-based grid programming environment
JGI '02 Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande
COORDINATION '99 Proceedings of the Third International Conference on Coordination Languages and Models
Transparent Migration of Java-Based Mobile Agents
MA '98 Proceedings of the Second International Workshop on Mobile Agents
Strong Mobility and Fine-Grained Resource Control in NOMADS
ASA/MA 2000 Proceedings of the Second International Symposium on Agent Systems and Applications and Fourth International Symposium on Mobile Agents
Transparent Migration of Mobile Agents Using the Java Platform Debugger Architecture
MA '01 Proceedings of the 5th International Conference on Mobile Agents
The Cactus Code: A Problem Solving Environment for the Grid
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
The NetSolve Environment: Progressing Towards the Seamless Grid
ICPP '00 Proceedings of the 2000 International Workshop on Parallel Processing
Making Java applications mobile or persistent
COOTS'01 Proceedings of the 6th conference on USENIX Conference on Object-Oriented Technologies and Systems - Volume 6
Exploiting idle cycles to execute data mining applications on clusters of PCs
Journal of Systems and Software
A serialization based approach for strong mobility of shared object
Proceedings of the 5th international symposium on Principles and practice of programming in Java
Error recovery mechanism for grid-based workflow within SLA context
International Journal of High Performance Computing and Networking
International Journal of Web and Grid Services
A serialisation based approach for processes strong mobility
DAIS'07 Proceedings of the 7th IFIP WG 6.1 international conference on Distributed applications and interoperable systems
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
A policy-based approach for strong mobility of composed Web services
Service Oriented Computing and Applications
Hi-index | 0.00 |
A major challenge facing grid applications is the appropriate handling of failures. In this paper we address the problem of making parallel Java applications based on Remote Method Invocation (RMI) fault tolerant in a way transparent to the programmer. We use globally consistent checkpointing to avoid having to restart long-running computations from scratch after a system crash. The application's execution state can be captured at any time also when some of the application's threads are blocked waiting for the result of a (nested) remote method call. We modify only the program's bytecode which makes our solution independent from a particular Java Virtual Machine (JVM) implementation. The bytecode transformation algorithm performs a compile time analysis to reduce the number of modifications in the application's code which has a direct impact on the application's performance. The fault tolerance extensions encompass also the RMI components such as the RMI registry. Since essential data as checkpoints are replicated, our system is resilient to simultaneous failures of multiple machines. Experimental results show negligible performance overhead of our fault-tolerance extensions.