Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Development of a debugger for a concurrent language
IEEE Transactions on Software Engineering
Debugging Parallel Programs with Instant Replay
IEEE Transactions on Computers
Developing multitasking applications programs
Proceedings of the Twenty-First Annual Hawaii International Conference on Software Track
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
ACM Computing Surveys (CSUR)
ACM Transactions on Programming Languages and Systems (TOPLAS)
A graphical representation of concurrent processes
PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
A Noninterference Monitoring and Replay Mechanism for Real-Time Software Testing and Debugging
IEEE Transactions on Software Engineering
A simple and correct shared-queue algorithm using compare-and-swap
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A bibliography of parallel debuggers, 1990 edition
ACM SIGPLAN Notices
Isolating failure-inducing thread schedules
ISSTA '02 Proceedings of the 2002 ACM SIGSOFT international symposium on Software testing and analysis
A Distributed Parallel Programming Framework
IEEE Transactions on Software Engineering
Visual parallel programming with Visper
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
A framework for visual parallel programming
PAS '95 Proceedings of the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis
Detecting causal relationships in distributed computations: in search of the holy grail
Distributed Computing
Flashback: a lightweight extension for rollback and deterministic replay for software debugging
ATEC '04 Proceedings of the annual conference on USENIX Annual Technical Conference
Testing Concurrent Objects with Application-Specific Schedulers
Proceedings of the 5th international colloquium on Theoretical Aspects of Computing
PRES: probabilistic replay with execution sketching on multiprocessors
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Multithreaded java program test generation
IBM Systems Journal
Architecting a chunk-based memory race recorder in modern CMPs
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Understanding complex multithreaded software systems by using trace visualization
Proceedings of the 5th international symposium on Software visualization
Hi-index | 0.00 |
We present a case study that illustrates a method of debugging concurrent processes in a parallel programming environment. It uses a new approach called speculative replay to reconstruct the behavior of a program from the histories of its individual processes. Known time dependencies between events in different processes are used to divide the histories into dependence blocks. A graphical representation called a concurrency map displays possibilities for concurrency among processes. The replay technique preserves the known dependencies and compares the process histories generated during replay with those that were logged during the original program execution. If a process generates a replay history that does not match its original history, replay backs up. An alternative ordering of events is created and tested to see if it produces process histories that match the original histories. Successively more controlled replay sequences are generated, by introducing additional dependencies. We describe ongoing work on tools that will control replay without reconstructing the entire space of possible event orderings.The case study presents a miniature example of shared-queue management that can be examined in detail. It demonstrates the replay technique and the construction and use of the concurrency map. Using our techniques, we detect a failure to which a standard algorithm for shared-queue management is susceptible.