Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Discarding Obsolete Information in a Replicated Database System
IEEE Transactions on Software Engineering - Special issue on distributed systems
Debugging Parallel Programs with Instant Replay
IEEE Transactions on Computers
An introduction to Estelle: a specification language for distributed systems
Computer Networks and ISDN Systems - Special Issue: Protocol Specification and Testing
Application of splay trees to data compression
Communications of the ACM
Recovery in distributed systems using optimistic message logging and check-pointing
Journal of Algorithms
Echidna, an Estelle compiler to prototype protocols on distributed computers
Concurrency: Practice and Experience
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
Concurrency Control in Distributed Database Systems
ACM Computing Surveys (CSUR)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Efficient Execution Replay Technique for Distributed Memory Architectures
EDMCC2 Proceedings of the 2nd Euronean Conference on Distributed Memory Computing
Hi-index | 0.24 |
This paper discusses design and implementation issues for a distributed debugger, called EREBUS, which fits in a programming environment for distributed programs written in Estelle, an ISO-normalized language. Problems pertaining to execution replays of distributed programs are discussed in detail, and performance of the prototype debugger is exhibited.