Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Horus: a flexible group communication system
Communications of the ACM
Fault-tolerant broadcasts and related problems
Distributed systems (2nd Ed.)
Reliable Distributed Computing with the ISIS Toolkit
Reliable Distributed Computing with the ISIS Toolkit
The object group design pattern
COOTS'96 Proceedings of the 2nd conference on USENIX Conference on Object-Oriented Technologies (COOTS) - Volume 2
Constructing reliable distributed communication systems with CORBA
IEEE Communications Magazine
Chameleon: A Software Infrastructure for Adaptive Fault Tolerance
IEEE Transactions on Parallel and Distributed Systems
A Framework for Evaluating Distributed Object Models and its Application to Web Engineering
Annals of Software Engineering
Hierarchical Error Detection in a Software Implemented Fault Tolerance (SIFT) Environment
IEEE Transactions on Knowledge and Data Engineering
AQuA: An Adaptive Architecture that Provides Dependable Distributed Objects
IEEE Transactions on Computers
A Scalable Fault-Tolerant Network Management System Built Using Distributed Object Technology
EDOC '97 Proceedings of the 1st International Conference on Enterprise Distributed Object Computing
CORBA Based Runtime Support for Load Distribution and Fault Tolerance
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
A Tailorable Distributed Programming Environment
Ada-Europe '02 Proceedings of the 7th Ada-Europe International Conference on Reliable Software Technologies
CCS Resource Management in Networked HPC Systems
HCW '98 Proceedings of the Seventh Heterogeneous Computing Workshop
A graphical environment for GLADE
Ada-Europe'03 Proceedings of the 8th Ada-Europe international conference on Reliable software technologies
FTRMI: fault-tolerant transparent RMI
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Transparently increasing RMI fault tolerance
ACM SIGAPP Applied Computing Review
Hi-index | 4.11 |
Distributed systems, such as satellite surveillance systems and real-time feeds for financial data, must be heterogeneous, interoperable, extensible, and available. Availability is a kind of fault tolerance: The system is able to provide important services despite partial failure of its computers or software objects. The Object Management Group's Common Object Request Broker Architecture addresses only the first three characteristics. With respect to heterogeneity, for example, programmers can hide details of the underlying hardware and system software behind a portable interface, using CORBA's Interface Definition Language. IDL allows CORBA objects to invoke operations on each other even when implemented in different languages and even when running on incompatible operating systems. Wrapper objects and Object Request Broker (ORB) gateways enable interoperability by letting programmers interface new technology to legacy information systems. Finally, CORBA supports the development of highly modular applications, so programmers can more easily achieve extensibility-as well as better maintainability. To help address availability and reliability, the author developed an experimental CORBA-based restart service and monitor called Piranha (not related to the Yale University system). Piranha acts as a network monitor that reports failures through a graphical user interface. It also acts as a manager, automatically restarting failed CORBA objects, replicating stateful objects (objects that maintain an internal set of values) on the fly, migrating objects from one host to another, and enforcing predefined replication degrees-numbers of copies-on groups of objects. The article first examines the ways in which a CORBA ORB should support availability. It then explains how Piranha affords availability.