Necessary and Sufficient Conditions for Consistent Global Snapshots
IEEE Transactions on Parallel and Distributed Systems
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
FlowManager: A Workflow Management System Based on Petri Nets
COMPSAC '02 Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and Redevelopment
Petri Nets as Token Objects: An Introduction to Elementary Object Nets
ICATPN '98 Proceedings of the 19th International Conference on Application and Theory of Petri Nets
Artificial Intelligence: A Modern Approach
Artificial Intelligence: A Modern Approach
User tools and languages for graph-based Grid workflows: Research Articles
Concurrency and Computation: Practice & Experience - Workflow in Grid Systems
A Reliable DICOM Transfer Grid Service Based on Petri Net Workflows
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Exception handling patterns for hierarchical scientific workflows
Proceedings of the 6th international workshop on Middleware for grid computing
Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids
IEEE Transactions on Parallel and Distributed Systems
Pattern based workflow design using reference nets
BPM'03 Proceedings of the 2003 international conference on Business process management
Vega: a service-oriented grid workflow management system
OTM'07 Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II
DEE: a distributed fault tolerant workflow enactment engine for grid computing
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Characterizing quality of resilience in scientific workflows
Proceedings of the 6th workshop on Workflows in support of large-scale science
Enforcing QoS in scientific workflow systems enacted over Cloud infrastructures
Journal of Computer and System Sciences
Automating Data-Throttling Analysis for Data-Intensive Workflows
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Hi-index | 0.08 |
Scientific workflow systems often operate in unreliable environments, and have accordingly incorporated different fault tolerance techniques. One of them is the checkpointing technique combined with its corresponding rollback recovery process. Different checkpointing schemes have been developed and at various levels: task- (or activity-) level and workflow-level. At workflow-level, the usually adopted approach is to establish a checkpointing frequency in the system which determines the moment at which a global workflow checkpoint - a snapshot of the whole workflow enactment state at normal execution (without failures) - has to be accomplished. We describe an alternative workflow-level checkpointing scheme and its corresponding rollback recovery process for hierarchical scientific workflows in which every workflow node in the hierarchy accomplishes its own local checkpoint autonomously and in an uncoordinated way after its enactment. In contrast to other proposals, we utilise the Reference net formalism for expressing the scheme. Reference nets are a particular type of Petri nets which can more effectively provide the abstractions to support and to express hierarchical workflows and their dynamic adaptability.