Implementation of resilient, atomic data types
ACM Transactions on Programming Languages and Systems (TOPLAS) - Lecture notes in computer science Vol. 174
Reliable communication in the presence of failures
ACM Transactions on Computer Systems (TOCS)
SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
High-Availability Computer Systems
Computer
Rover: a toolkit for mobile information access
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Managing update conflicts in Bayou, a weakly connected replicated storage system
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Mobile Computing with the Rover Toolkit
IEEE Transactions on Computers - Special issue on mobile computing
The Java programming language (2nd ed.)
The Java programming language (2nd ed.)
Implementing remote procedure calls
ACM Transactions on Computer Systems (TOCS)
Transaction Processing: Concepts and Techniques
Transaction Processing: Concepts and Techniques
Operating system support for mobile agents
HOTOS '95 Proceedings of the Fifth Workshop on Hot Topics in Operating Systems (HotOS-V)
Design and Performance of Horus: A Lightweight Group Communications System
Design and Performance of Horus: A Lightweight Group Communications System
Client-server computing in mobile environments
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
This paper discusses extensions to the Rover toolkit for constructing reliable mobile-aware applications. The extensions improve upon the existing failure model, which addresses client or communication failures and guarantees reliable message delivery from clients to server, but does not address server failures (e.g., the loss of an incoming message due to server failure) (Joseph et al., 1997). Due to the unpredictable, intermittent communication connectivity typically found in mobile client environments, it is inappropriate to make clients responsible for guaranteeing request completion at servers. The extensions discussed in this paper provide both system- and language-level support for reliable operation in the form of stable logging of each message received by a server, per-application stable variables, programmer-supplied failure recovery procedures, server process failure detection, and automatic server process restart. The design and implementation of fault-tolerance support is optimized for high performance in the normal case (network connectivity provided by a high latency, low bandwidth, wireless link): measurements show a best-case overhead of less than 7% for a reliable null RPC over wired and cellular dialup links. Experimental results from both micro-benchmarks and applications, such as the Rover Web Browser proxy, show that support for reliable operation can be provided at an overhead of only a few percent of execution time during normal operation.