Partial orders for parallel debugging
PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
A general-purpose algorithm for analyzing concurrent programs
Communications of the ACM
Consistent detection of global predicates
PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
The derivation of distributed termination detection algorithms from garbage collection schemes
ACM Transactions on Programming Languages and Systems (TOPLAS)
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Experiences with building distributed debuggers
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Fundamentals of fault-tolerant distributed computing in asynchronous environments
ACM Computing Surveys (CSUR)
Debugging distributed programs using controlled re-execution
Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Lamport on mutual exclusion: 27 years of planting seeds
Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
A Roll-Forward Recovery Scheme for Solving the Problem of Coasting Forward for Distributed Systems
ACM SIGOPS Operating Systems Review
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Addressing False Causality while Detecting Predicates in Distributed Programs
ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
On Slicing a Distributed Computation
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Causal distributed assert statements
Causal distributed assert statements
Detecting causal relationships in distributed computations: in search of the holy grail
Distributed Computing
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV
Hi-index | 0.00 |
Capturing and examining the causal and concurrent relationships of a distributed system is essential to a wide range of distributed systems applications. Many approaches to gathering this information rely on trace files of executions. The information obtained through tracing is limited to those executions observed. We present a methodology that analyzes the source code of the distributed system. Our analysis considers each process's source code and produces a single comprehensive graph of the system's possible behaviors. The graph, termed the partial order graph (POG), uniquely represents each possible partial order of the system. Causal and concurrent relationships can be extracted relative either to a particular partial order, which is synonymous to a single execution, or to a collection of partial orders. The graph provides a means of reasoning about the system in terms of relationships that will definitely occur, may possible occur, and will never occur. Distributed assert statements provide a means to monitor distributed system executions. By constructing the POG prior to system execution, the causality information provided by the POG enables run-time evaluation of the assert statement without relying on traces or addition messages.