Consistency in a partitioned network: a survey
ACM Computing Surveys (CSUR)
Concurrency control and recovery in database systems
Concurrency control and recovery in database systems
Exploiting virtual synchrony in distributed systems
SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
Simulation methodology for statisticians, operations analysts, and engineers: vol. 1
Simulation methodology for statisticians, operations analysts, and engineers: vol. 1
The process group approach to reliable distributed computing
Communications of the ACM
Impossibility of distributed consensus with one faulty process
Journal of the ACM (JACM)
Managing update conflicts in Bayou, a weakly connected replicated storage system
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
The implementation of a CORBA object group service
Theory and Practice of Object Systems - Special issue high availability in CORBA
Client-Access Protocols for Replicated Services
IEEE Transactions on Software Engineering
Indulgent algorithms (preliminary version)
Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
The Byzantine Generals Problem
ACM Transactions on Programming Languages and Systems (TOPLAS)
Group Communication in Partitionable Systems: Specification and Algorithms
IEEE Transactions on Software Engineering
Group communication specifications: a comprehensive study
ACM Computing Surveys (CSUR)
On group communication in large-scale distributed systems
EW 6 Proceedings of the 6th workshop on ACM SIGOPS European workshop: Matching operating systems to application needs
Surviving Network Partitioning
Computer
AQuA: An Adaptive Architecture that Provides Dependable Distributed Objects
IEEE Transactions on Computers
Towards Upgrading Actively Replicated Servers On-the-Fly
COMPSAC '02 Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and Redevelopment
Reconciling Replication and Transactions for the End-to-End Reliability of CORBA Applications
On the Move to Meaningful Internet Systems, 2002 - DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and ODBASE 2002
A Low Latency, Loss Tolerant Architecture and Protocol for Wide Area Group Communication
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Replicating CORBA objects: a marriage between active and passive replication
Proceedings of the IFIP WG 6.1 International Working Conference on Distributed Applications and Interoperable Systems II
Design and implemantation of a CORBA fault-tolerant object group service
Proceedings of the IFIP WG 6.1 International Working Conference on Distributed Applications and Interoperable Systems II
Eternal: a component-based framework for transparent fault-tolerant CORBA
Software—Practice & Experience - Special issue: Enterprise frameworks
DOORS: Towards High-Performance Fault Tolerant CORBA
DOA '00 Proceedings of the International Symposium on Distributed Objects and Applications
System Support for Partition-Aware Network Applications
ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
Large-Scale Simulation of Replica Placement Algorithms for a Serverless Distributed File System
MASCOTS '01 Proceedings of the Ninth International Symposium in Modeling, Analysis and Simulation of Computer and Telecommunication Systems
WECWIS '00 Proceedings of the Second International Workshop on Advance Issues of E-Commerce and Web-Based Information Systems (WECWIS 2000)
The ensemble system
Transparent fault tolerance for java remote method invocation
Transparent fault tolerance for java remote method invocation
Experiences, Strategies, and Challenges in Building Fault-Tolerant CORBA Systems
IEEE Transactions on Computers
A Global-State-Triggered Fault Injector for Distributed System Evaluation
IEEE Transactions on Parallel and Distributed Systems
Autonomic Computing
Preventing orphan requests by integrating replication and transactions
ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
An approach to experimentally obtain service dependability characteristics of the Jgroup/ARM system
EDCC'05 Proceedings of the 5th European conference on Dependable Computing
Type-safe dynamic protocol composition in Jgroup/ARM
Proceedings of the 3rd International DiscCoTec Workshop on Middleware-Application Interaction
Foraging for Better Deployment of Replicated Service Components
DAIS '09 Proceedings of the 9th IFIP WG 6.1 International Conference on Distributed Applications and Interoperable Systems
Ant system for service deployment in private and public clouds
Proceedings of the 2nd workshop on Bio-inspired algorithms for distributed systems
FTRMI: fault-tolerant transparent RMI
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Transparently increasing RMI fault tolerance
ACM SIGAPP Applied Computing Review
Enhancing group communication with self-manageable behavior
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
This paper presents the design and implementation of Jgroup-ARM, a distributed object group platform with autonomous replication management along with a novel measurement-based assessment technique that is used to validate the fault-handling capability of Jgroup-ARM. Jgroup extends Java RMI through the group communication paradigm and has been designed specifically for application support in partitionable systems. ARM aims at improving the dependability characteristics of systems through a fault-treatment mechanism. Hence, ARM focuses on deployment and operational aspects, where the gain in terms of improved dependability is likely to be the greatest. The main objective of ARM is to localize failures and to reconfigure the system according to application-specific dependability requirements. Combining Jgroup and ARM can significantly reduce the effort necessary for developing, deploying and managing dependable, partition-aware applications. Jgroup-ARM is evaluated experimentally to validate its fault-handling capability; the recovery performance of a system deployed in a wide area network is evaluated. In this experiment multiple nearly coincident reachability changes are injected to emulate network partitions separating the service replicas. The results show that Jgroup-ARM is able to recover applications to their initial state in several realistic failure scenarios, including multiple, concurrent network partitionings. Copyright © 2007 John Wiley & Sons, Ltd.