Algorithms for mutual exclusion
Algorithms for mutual exclusion
Understanding fault-tolerant distributed systems
Communications of the ACM
Pace condition detection for debugging shared-memory parallel programs
Pace condition detection for debugging shared-memory parallel programs
Optimal tracing and replay for debugging message-passing parallel programs
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Optimal tracing and replay for debugging shared-memory parallel programs
PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
Efficient detection of determinacy races in Cilk programs
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Progressive Retry for Software Failure Recovery in Message-Passing Applications
IEEE Transactions on Computers
Deterministic replay of Java multithreaded applications
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Concurrent Programming in Java: Design Principles and Patterns
Concurrent Programming in Java: Design Principles and Patterns
Predicate Control for Active Debugging of Distributed Programs
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Computation Slicing: Techniques and Theory
DISC '01 Proceedings of the 15th International Conference on Distributed Computing
On Slicing a Distributed Computation
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Hi-index | 0.00 |
Concurrent programs often encounter failures, such as races, owing to the presence of synchronization faults (bugs). One existing technique to tolerate synchronization faults is to roll back the program to a previous state andre -execute, in the hope that the failure does not recur. Insteadof relying on chance, our approach is to control the reexecution in order to avoid a recurrence of the synchronization failure. The control is achievedb y tracing information during an execution andu sing this information to add synchronizations during the re-execution. The approach gives rise to a general problem, calledt he off-line predicate control problem, which takes a computation anda property specified on the computation, andou tputs a "controlled" computation that maintains the property. We solve the predicate control problem for the mutual exclusion property, which is especially important in synchronization fault tolerance.