Handling Timing Errors in Distributed Programs

Authors:
A. J. Gordon;R. A. Finkel
Affiliations:
Colorado School of Mines, Golden, CO;Univ. of Kentucky, Lexington, KY
Venue:
IEEE Transactions on Software Engineering
Year:
1988

Citing 7
Cited 4

Monitoring distributed systems

ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Metric (Extended Abstract): A kernel instrumentation system for distributed environments

SOSP '77 Proceedings of the sixth ACM symposium on Operating systems principles
Generalized path expressions: A high level debugging mechanism (Preliminary Draft)

SIGSOFT '83 Proceedings of the ACM SIGSOFT/SIGPLAN software engineering symposium on High-level debugging
Development of a debugger for a concurrent language

SIGSOFT '83 Proceedings of the ACM SIGSOFT/SIGPLAN software engineering symposium on High-level debugging
INTERACTIVE DEBUGGING IN A DISTRIBUTED COMPUTATIONAL

INTERACTIVE DEBUGGING IN A DISTRIBUTED COMPUTATIONAL
Performance Characterization of Distributed Programs

Performance Characterization of Distributed Programs

A Noninterference Monitoring and Replay Mechanism for Real-Time Software Testing and Debugging

IEEE Transactions on Software Engineering
A bibliography of parallel debuggers, 1993 edition

PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
Automatic detection of errors in distributed systems

CSC '95 Proceedings of the 1995 ACM 23rd annual conference on Computer science
Using Hy+ for network management and distributed debugging

CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: software engineering - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

The authors describe a tool called TAP, which is defined to aid the programmer in discovering the causes of timing errors in running programs. TAP is similar to a postmortem debugger, using the history of interprocess communication to construct a timing graph, a directed graph where an edge joins node x to node y if event x directly precedes event y in time. The programmer can then use TAP to look at the graph to find the events that occurred in an unacceptable order. Because of the nondeterministic nature of distributed programs, the authors feel a history-keeping mechanism but always be active so that bugs can be dealt with as they occur. The goal is to collect enough information at run time to construct the timing graph if needed. Since it is always active, this mechanism must be efficient. The authors also describe experiments run using TAP and report the impact that TAP's history-keeping mechanism has on the running time of various distributed programs.