Design patterns: elements of reusable object-oriented software
Design patterns: elements of reusable object-oriented software
Distributed systems (2nd Ed.)
The design and performance of a scable ORB architecture for COBRA asynchronous messaging
IFIP/ACM International Conference on Distributed systems platforms
Component-based product line development of avionics software
Proceedings of the first conference on Software product lines : experience and research directions: experience and research directions
Component-based software engineering: putting the pieces together
Component-based software engineering: putting the pieces together
Component Software: Beyond Object-Oriented Programming
Component Software: Beyond Object-Oriented Programming
IEEE Transactions on Knowledge and Data Engineering
AQuA: An Adaptive Architecture that Provides Dependable Distributed Objects
IEEE Transactions on Computers
Applying Patterns to Improve the Performance of Fault Tolerant CORBA
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
A Dynamic Replica Selection Algorithm for Tolerating Timing Faults
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Proceedings of the 25th International Conference on Software Engineering
Analyzing Dependencies in Large Component-Based Systems
Proceedings of the 17th IEEE international conference on Automated software engineering
DOORS: Towards High-Performance Fault Tolerant CORBA
DOA '00 Proceedings of the International Symposium on Distributed Objects and Applications
A Fault Tolerance Framework for CORBA
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Automatic Failure-Path Inference: A Generic Introspection Technique for Internet Applications
WIAPP '03 Proceedings of the The Third IEEE Workshop on Internet Applications
Integration of QoS Facilities into Component Container Architectures
ISORC '02 Proceedings of the Fifth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing
DARX—A Framework For The Fault-Tolerant Support Of Agent Software
ISSRE '03 Proceedings of the 14th International Symposium on Software Reliability Engineering
Proactive Recovery in Distributed CORBA Applications
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
RTCSA '06 Proceedings of the 12th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications
A looming fault tolerance software crisis?
ACM SIGSOFT Software Engineering Notes
Rx: Treating bugs as allergies—a safe method to survive software failures
ACM Transactions on Computer Systems (TOCS)
ISAS '07 Proceedings of the 4th international symposium on Service Availability
Adaptive Failover for Real-Time Middleware with Passive Replication
RTAS '09 Proceedings of the 2009 15th IEEE Symposium on Real-Time and Embedded Technology and Applications
Rectifying orphan components using group-failover in distributed real-time and embedded systems
Proceedings of the 14th international ACM Sigsoft symposium on Component based software engineering
Benchmarking Peer-to-Peer Systems
A real-time perspective of service composition: Key concepts and some contributions
Journal of Systems Architecture: the EUROMICRO Journal
Towards a resilient deployment and configuration infrastructure for fractionated spacecraft
ACM SIGBED Review - Special Issue on the 5th Workshop on Adaptive and Reconfigurable Embedded Systems
Hi-index | 0.01 |
Although component middleware is increasingly used to develop distributed, real-time and embedded (DRE) systems, it poses new fault-tolerance challenges, such as the need for efficient synchronization of internal component state, failure correlation across groups of components, and configuration of fault-tolerance properties at the component granularity level. This paper makes three contributions to R&D on component-based fault-tolerance. First, it describes the COmponent Replication based on Failover Units (CORFU) component middleware, which provides fail-stop behavior and fault correlation across groups of components treated as an atomic unit in DRE systems. Second, it describes how CORFU's Components with HEterogeneous State Synchronization (CHESS) module provides mechanisms for real-time aware state transfer and synchronization in CORFU. Third, we empirically evaluate the client failover and group shutdown capabilities of CORFU and its CHESS module and compare/contrast it with existing object-oriented fault-tolerance methods. Our results show that component middleware (1) has acceptable fault-tolerance performance for DRE systems, (2) allows timely recovery while considering failure location, size, and functional topology of the group, and finally (3) eases the burden of application development by providing middleware support for fault-tolerance at the component level.