Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Debugging Parallel Programs with Instant Replay
IEEE Transactions on Computers
A software instruction counter
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Debugging Concurrent Ada Programs by Deterministic Execution
IEEE Transactions on Software Engineering
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Threads primer: a guide to multithreaded programming
Threads primer: a guide to multithreaded programming
Replay for concurrent non-deterministic shared-memory applications
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Concurrent Programming in Java: Design Principles and Patterns
Concurrent Programming in Java: Design Principles and Patterns
Deriving Optimal Checkpoint Protocols for Distributed Shared Memory Architectures
Selected Papers from the International Workshop on Theory and Practice in Distributed Systems
Transparent Migration of Java-Based Mobile Agents
MA '98 Proceedings of the Second International Workshop on Mobile Agents
Supporting nondeterministic execution in fault-tolerant systems
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
How to recover efficiently and asynchronously when optimism fails
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Distributed Recovery with K-Optimistic Logging
ICDCS '97 Proceedings of the 17th International Conference on Distributed Computing Systems (ICDCS '97)
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Selective Checkpointing and Rollbacks in Multithreaded Distributed Systems
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Log-based recovery for middleware servers
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Log-based middleware server recovery with transaction support
The VLDB Journal — The International Journal on Very Large Data Bases
PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
Hi-index | 0.00 |
The problem of recovering distributed systems from crash failures has been widely studied in the context of traditional non-threaded processes. However, extending those solutions to the multi-threaded scenario presents new problems. We identify and address these problems for optimistic logging protocols.There are two natural extension to optimistic logging protocols in the multi-threaded scenario. The first extension is "process-centric", where the points of internal non-determinism caused by threads are logged. The second extension is "thread-centric", where each thread is treated as a separate process. The process-centric approach suffers from false causality while the thread-centric approach suffers from high causality tracking overhead. By observing that the granularity of failures can be different from the granularity of rollbacks, we design a new "balanced" approach which incurs low causality tracking overhead and also eliminates false causality.