Science of Computer Programming
Low cost management of replicated data in fault-tolerant distributed systems
ACM Transactions on Computer Systems (TOCS)
Availability in partitioned replicated databases
PODS '86 Proceedings of the fifth ACM SIGACT-SIGMOD symposium on Principles of database systems
Distributed process groups in the V Kernel
ACM Transactions on Computer Systems (TOCS)
Determining the last process to fail
ACM Transactions on Computer Systems (TOCS)
Replicated distributed programs
Proceedings of the tenth ACM symposium on Operating systems principles
Replication and fault-tolerance in the ISIS system
Proceedings of the tenth ACM symposium on Operating systems principles
An efficient, fault-tolerant protocol for replicated data management
PODS '85 Proceedings of the fourth ACM SIGACT-SIGMOD symposium on Principles of database systems
Concurrency Control in Distributed Database Systems
ACM Computing Surveys (CSUR)
Fail-stop processors: an approach to designing fault-tolerant computing systems
ACM Transactions on Computer Systems (TOCS)
ACM Transactions on Computer Systems (TOCS)
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
A recovery algorithm for a distributed database system
PODS '83 Proceedings of the 2nd ACM SIGACT-SIGMOD symposium on Principles of database systems
Notes on Data Base Operating Systems
Operating Systems, An Advanced Course
Programming with Shared Bulletin Boards in Asynchronus Distributed Systems
Programming with Shared Bulletin Boards in Asynchronus Distributed Systems
Exploiting virtual synchrony in distributed systems
SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
Substituting for real time and common knowledge in asynchronous distributed systems
PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
A multicast transport protocol
SIGCOMM '88 Symposium proceedings on Communications architectures and protocols
Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems
PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
Concurrent common knowledge: a new definition of agreement for asynchronous systems
PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
Preserving and using context information in interprocess communication
ACM Transactions on Computer Systems (TOCS)
Programming languages for distributed computing systems
ACM Computing Surveys (CSUR)
AMp: a highly parallel atomic multicast protocol
SIGCOMM '89 Symposium proceedings on Communications architectures & protocols
Coda: A Highly Available File System for a Distributed Workstation Environment
IEEE Transactions on Computers
The inhibition spectrum and the achievement of causal consistency
PODC '90 Proceedings of the ninth annual ACM symposium on Principles of distributed computing
Lazy replication: exploiting the semantics of distributed services
PODC '90 Proceedings of the ninth annual ACM symposium on Principles of distributed computing
Early-delivery atomic broadcast
PODC '90 Proceedings of the ninth annual ACM symposium on Principles of distributed computing
An associated object model for distributed systems
ACM SIGOPS Operating Systems Review
Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
DistEdit: a distributed toolkit for supporting multiple group editors
CSCW '90 Proceedings of the 1990 ACM conference on Computer-supported cooperative work
Avoiding name resolution loops and duplications in group communications
SIGCOMM '90 Proceedings of the ACM symposium on Communications architectures & protocols
The implementation of guaranteed, reliable, secure broadcast networks
CSC '90 Proceedings of the 1990 ACM annual conference on Cooperation
Understanding fault-tolerant distributed systems
Communications of the ACM
Paradigms for process interaction in distributed programs
ACM Computing Surveys (CSUR)
A naming system for feature-based service specification in distributed operating systems
SIGSMALL '91 Proceedings of the 1991 ACM SIGSMALL/PC symposium on Small systems
Inconsistency and contamination (preliminary version)
PODC '91 Proceedings of the tenth annual ACM symposium on Principles of distributed computing
Unreliable failure detectors for asynchronous systems (preliminary version)
PODC '91 Proceedings of the tenth annual ACM symposium on Principles of distributed computing
Using process groups to implement failure detection in asynchronous environments
PODC '91 Proceedings of the tenth annual ACM symposium on Principles of distributed computing
Communication Facilities for Distributed Transaction-Processing Systems
Computer - Distributed computing systems: separate resources acting as one
Sequential consistency versus linearizability (extended abstract)
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Reliable broadcast algorithms for HARTS
ACM Transactions on Computer Systems (TOCS)
A simple reliable globally-ordered broadcast service
ACM SIGOPS Operating Systems Review
Fault tolerance using group communication
ACM SIGOPS Operating Systems Review
Some ideas on support for fault tolerance in COMANDOS, an object oriented distributed system
ACM SIGOPS Operating Systems Review
Ordered and reliable multicast communication
ACM Transactions on Computer Systems (TOCS)
Lightweight causal and atomic group multicast
ACM Transactions on Computer Systems (TOCS)
A dynamic network architecture
ACM Transactions on Computer Systems (TOCS)
Orca: A Language for Parallel Programming of Distributed Systems
IEEE Transactions on Software Engineering
Media transports and distributed multimedia flows
SAC '92 Proceedings of the 1992 ACM/SIGAPP symposium on Applied computing: technological challenges of the 1990's
TOMP a total ordering multicast protocol
ACM SIGOPS Operating Systems Review
A mechanism of process group for application reliability in distributed systems
ACM SIGOPS Operating Systems Review
A graphical interface for analysis of communication protocols
CSC '92 Proceedings of the 1992 ACM annual conference on Communications
Providing high availability using lazy replication
ACM Transactions on Computer Systems (TOCS)
A naming system for feature-based service specification in distributed operating systems
ACM SIGSMALL/PC Notes
Parallel Programming Using Shared Objects and Broadcasting
Computer - Special issue on sharing: high performance at low cost
An annotated bibliography of dependable distributed computing
ACM SIGOPS Operating Systems Review
An evaluation framework for Multicast Ordering Protocols
SIGCOMM '92 Conference proceedings on Communications architectures & protocols
Reliability and scaling issues in multicast communication
SIGCOMM '92 Conference proceedings on Communications architectures & protocols
Simulating synchronized clocks and common knowledge in distributed systems
Journal of the ACM (JACM)
The consensus problem in fault-tolerant computing
ACM Computing Surveys (CSUR)
GRPC: a communication cooperation mechanism in distributed systems
ACM SIGOPS Operating Systems Review
Implementing hybrid consistency with high-level synchronization operations
PODC '93 Proceedings of the twelfth annual ACM symposium on Principles of distributed computing
A response to Cheriton and Skeen's criticism of causal and totally ordered communication
ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review
Causal ordering in reliable group communications
SIGCOMM '93 Conference proceedings on Communications architectures, protocols and applications
Understanding the limitations of causally and totally ordered communication
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
A name model for nested group communication
IEEE/ACM Transactions on Networking (TON)
Sequential consistency versus linearizability
ACM Transactions on Computer Systems (TOCS)
New protocols for third-party-based authentication and secure broadcast
CCS '94 Proceedings of the 2nd ACM Conference on Computer and communications security
Communication control in computer supported cooperative work systems
CSCW '94 Proceedings of the 1994 ACM conference on Computer supported cooperative work
A security architecture for fault-tolerant systems
ACM Transactions on Computer Systems (TOCS) - Special issue on computer architecture
Global flush communication primitive for inter-process communication
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Adaptive algorithms for PASO systems
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
A Total Ordering Multicast Protocol Using Propagation Trees
IEEE Transactions on Parallel and Distributed Systems
A reliable dissemination protocol for interactive collaborative applications
Proceedings of the third ACM international conference on Multimedia
Unreliable failure detectors for reliable distributed systems
Journal of the ACM (JACM)
A Secure Group Membership Protocol
IEEE Transactions on Software Engineering
Building reliable mobile-aware applications using the Rover toolkit
MobiCom '96 Proceedings of the 2nd annual international conference on Mobile computing and networking
Independent Recovery in Large-Scale Distributed Systems
IEEE Transactions on Software Engineering
Detection of Strong Unstable Predicates in Distributed Programs
IEEE Transactions on Parallel and Distributed Systems
An optimal algorithm for generalized causal message ordering
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Characterization of message ordering specifications and protocols
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
Causal Ordering in Distributed Mobile Systems
IEEE Transactions on Computers - Special issue on mobile computing
IEEE Transactions on Parallel and Distributed Systems
Design and Evaluation of a Window-Consistent Replication Service
IEEE Transactions on Computers
Building reliable mobile-aware applications using the Rover toolkit
Wireless Networks - Special issue: mobile computing and networking: selected papers from MobiCom '96
A framework for delivering multicast message in networks with mobile hosts
Mobile Networks and Applications - Special issue: routing in mobile communications networks
Implementing sequentially consistent shared objects using broadcast and point-to-point communication
Journal of the ACM (JACM)
A protocol for causally ordered message delivery in mobile computing systems
Mobile Networks and Applications - Special issue on personal communications services
Multicast security and its extension to a mobile environment
Wireless Networks
A Positive Acknowledgment Protocol for Causal Broadcasting
IEEE Transactions on Computers
Exploiting an event-based infrastructure to develop complex distributed systems
Proceedings of the 20th international conference on Software engineering
CSC '91 Proceedings of the 19th annual conference on Computer Science
Mu3D: a causal consistency protocol for a collaborative VRML editor
VRML '00 Proceedings of the fifth symposium on Virtual reality modeling language (Web3D-VRML)
Efficient atomic broadcast using deterministic merge
Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
SmartBridge: a scalable bridge architecture
Proceedings of the conference on Applications, Technologies, Architectures, and Protocols for Computer Communication
Spatially distributed databases on sensors
Proceedings of the 8th ACM international symposium on Advances in geographic information systems
A Randomized Contention-Based Load-Balancing Protocol for a Distributed Multiserver Queuing System
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Software Engineering
Causal consistency in mobile environment
ACM SIGOPS Operating Systems Review
An efficient algorithm for causal messages ordering
Proceedings of the 2001 ACM symposium on Applied computing
Consensus-based fault-tolerant total order multicast
IEEE Transactions on Parallel and Distributed Systems
The JEDI Event-Based Infrastructure and Its Application to the Development of the OPSS WFMS
IEEE Transactions on Software Engineering
Fault tolerance using group communication
EW 4 Proceedings of the 4th workshop on ACM SIGOPS European workshop
Some ideas on support for fault tolerance in COMANDOS, an object oriented distributed system
EW 4 Proceedings of the 4th workshop on ACM SIGOPS European workshop
Designing application software in wide area network settings
EW 4 Proceedings of the 4th workshop on ACM SIGOPS European workshop
OOPSLA '01 Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
EW 7 Proceedings of the 7th workshop on ACM SIGOPS European workshop: Systems support for worldwide applications
Partial order relations in distributed object environments
ACM SIGOPS Operating Systems Review
Large causality: ordering broadcasts and messages
EW 5 Proceedings of the 5th workshop on ACM SIGOPS European workshop: Models and paradigms for distributed systems structuring
Cost based data dissemination in satellite networks
Mobile Networks and Applications
Group communication support for distributed collaboration systems
Cluster Computing
Mobile Networks and Applications
The Journal of Supercomputing
RMP: Fault-Tolerant Group Communication
IEEE Micro
Logically Instantaneous Message Passing in Asynchronous Distributed Systems
IEEE Transactions on Computers
The Timewheel Group Communication System
IEEE Transactions on Computers
Consensus-Based Fault-Tolerant Total Order Multicast
IEEE Transactions on Parallel and Distributed Systems
Broadcast Protocols for Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
A Fault-Tolerant Protocol for Atomic Broadcast
IEEE Transactions on Parallel and Distributed Systems
An Implementation of F-Channels
IEEE Transactions on Parallel and Distributed Systems
A Service Acquisition Mechanism for Server-Based Heterogeneous Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Processor Membership in Asynchronous Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
A Group Membership Algorithm with a Practical Specification
IEEE Transactions on Parallel and Distributed Systems
Flow Control for Limited Buffer Multicast
IEEE Transactions on Software Engineering
Message Logging: Pessimistic, Optimistic, Causal, and Optimal
IEEE Transactions on Software Engineering
Multicast group membership management
IEEE/ACM Transactions on Networking (TON)
Achieving Fault-Tolerant Ordered Broadcasts in CAN
EDCC-3 Proceedings of the Third European Dependable Computing Conference on Dependable Computing
Reducing False Causality in Causal Message Ordering
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Evaluation of the Optimal Causal Message Ordering Algorithm
HiPC '00 Proceedings of the 7th International Conference on High Performance Computing
Performance Evaluation of Fault Tolerance for Parallel Applications in Networked Environments
ICPP '97 Proceedings of the international Conference on Parallel Processing
Cost Based Data Dissemination in Broadcast Networks with Disconnection
ICDT '01 Proceedings of the 8th International Conference on Database Theory
The Raincore Distributed Session Service for Networking Elements
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Consistent Lamport Clocks for Asynchronous Groups with Process Crashes
PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
Logically Instantaneous Communication on Top of Distributed Memory Parallel Machines
PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
Optimistic Replication for Internet Data Services
DISC '00 Proceedings of the 14th International Conference on Distributed Computing
DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Performance Analysis of Java Group Toolkits: A Case Study
FIDJI '01 Revised Papers from the International Workshop on Scientific Engineering for Distributed Java Applications
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Algorithm Visualization For Distributed Environments
INFOVIS '98 Proceedings of the 1998 IEEE Symposium on Information Visualization
Topology-Aware Algorithms for Large-Scale Communication
Advances in Distributed Systems, Advanced Distributed Computing: From Algorithms to Systems
Time in Distributed System Models and Algorithms
Advances in Distributed Systems, Advanced Distributed Computing: From Algorithms to Systems
Quorum-Based Locking Protocol in Nested Invocations of Methods
DEXA '01 Proceedings of the 12th International Conference on Database and Expert Systems Applications
Proceedings of the 13th International Symposium on Distributed Computing
A Scalable and Reliable Multicast Communiction Service in Java
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
DISC '00 Proceedings of the 14th International Conference on Distributed Computing
A multiple bus broadcast protocol resilient to non-cooperative Byzantine faults
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Object-Based Checkpoints in Distributed Systems
WORDS '97 Proceedings of the 3rd Workshop on Object-Oriented Real-Time Dependable Systems - (WORDS '97)
Object-Based Message Ordering in Group Communication
WORDS '97 Proceedings of the 3rd Workshop on Object-Oriented Real-Time Dependable Systems - (WORDS '97)
INFOCOM '97 Proceedings of the INFOCOM '97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Driving the Information Revolution
Quorum-Based Protocol for Locking Replicas of Objects
ICCNMC '01 Proceedings of the 2001 International Conference on Computer Networks and Mobile Computing (ICCNMC'01)
The Hash History Approach for Reconciling Mutual Inconsistency
ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
Trading Replication Consistency for Performance and Availability: an Adaptive Approach
ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
A Replication Technique Based on a Functional and Attribute Grammar Computation Model
ISSRE '96 Proceedings of the The Seventh International Symposium on Software Reliability Engineering
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
A group communication protocol architecture for distributed network management systems
ICCCN '95 Proceedings of the 4th International Conference on Computer Communications and Networks
Caching policy design and cache allocation in active reliable multicast
Computer Networks: The International Journal of Computer and Telecommunications Networking
Performance of the Optimal Causal Multicast Algorithm: A Statistical Analysis
IEEE Transactions on Parallel and Distributed Systems
A service acquisition mechanism for the client/service model in cygnus
CASCON '91 Proceedings of the 1991 conference of the Centre for Advanced Studies on Collaborative research
Light-weight multicast services (LMS): a router-assisted scheme for reliable multicast
IEEE/ACM Transactions on Networking (TON)
A page-coherent, causally consistent protocol for distributed shared memory
Journal of Systems and Software
User-assisted tools for concurrency control in distributed multimedia collaborations
Proceedings of the 12th annual ACM international conference on Multimedia
A simple and fast asynchronous consensus protocol based on a weak failure detector
Distributed Computing
Total order broadcast and multicast algorithms: Taxonomy and survey
ACM Computing Surveys (CSUR)
Consistent and automatic replica regeneration
ACM Transactions on Storage (TOS)
SCADA with Fault Tolerant CORBA on Fault Tolerant LANE ATM
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 16 - Volume 17
ACM Computing Surveys (CSUR)
Synchronous, asynchronous, and causally ordered communication
Distributed Computing
Implementation of hierarchical F-channels for high-performance distributed computing
Distributed Computing
Concurrent common knowledge: defining agreement for asynchronous systems
Distributed Computing
Reliable and total order broadcast in the crash-recovery model
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing
Mobi_Causal: a protocol for causal message ordering in mobile computing systems
ACM SIGMOBILE Mobile Computing and Communications Review
The inhibition spectrum and the achievement of causal consistency
Distributed Computing
Implementing hybrid consistency with high-level synchronization operations
Distributed Computing
Reliable group communication and institutional action in a multi-agent trading scenario
Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Groups Partitioning Over CORBA for Cooperative Work
Cluster Computing
Journal of Parallel and Distributed Computing
Message-ordered multicast by common building blocks
InfoScale '06 Proceedings of the 1st international conference on Scalable information systems
Fully Distributed Three-Tier Active Software Replication
IEEE Transactions on Parallel and Distributed Systems
Worm-IT - A wormhole-based intrusion-tolerant group communication system
Journal of Systems and Software
Consistent and automatic replica regeneration
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Parallel processing with windows NT networks
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Implementing causal logging using OrbixWeb interception
COOTS'99 Proceedings of the 5th conference on USENIX Conference on Object-Oriented Technologies & Systems - Volume 5
Optimal atomic broadcast and multicast algorithms for wide area networks
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
The co-replication methodology and its application to structured parallel programs
Proceedings of the 2007 symposium on Component and framework technology in high-performance and scientific computing
BMobi_Causal: a causal broadcast protocol in mobile dynamic groups
Proceedings of the twenty-seventh ACM symposium on Principles of distributed computing
Timed buffers: A technique for update propagation in nomadic environments
Computer Communications
Preserving the consistency of distributed objects with real-time transactions
NOTERE '08 Proceedings of the 8th international conference on New technologies in distributed systems
Reliable Group Communication and Institutional Action in a Multi-agent Trading Scenario
Agent Communication II
Reducing Transaction Abort Rates with Prioritized Atomic Multicast Protocols
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
The HyperVerse: concepts for a federated and Torrent-based '3D Web'
International Journal of Advanced Media and Communication
Solving Atomic Multicast When Groups Crash
OPODIS '08 Proceedings of the 12th International Conference on Principles of Distributed Systems
A step towards a new generation of group communication systems
Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
Decentralized message ordering for publish/subscribe systems
Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware
A simple totally ordered broadcast protocol
LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
Efficient dependency tracking for relevant events in concurrent systems
Distributed Computing
A causal multicast protocol for dynamic groups in cellular networks
Proceedings of the 2008 Euro American Conference on Telematics and Information Systems
Epidemic protocols for pervasive computing systems: moving focus from architecture to protocol
M-PAC '09 Proceedings of the International Workshop on Middleware for Pervasive Mobile and Embedded Computing
A Context-Driven Framework for Distributed Collaboration
DS-RT '09 Proceedings of the 2009 13th IEEE/ACM International Symposium on Distributed Simulation and Real Time Applications
On the Cost of Prioritized Atomic Multicast Protocols
OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part I
ACM SIGACT News
Fast, flexible, and highly resilient genuine fifo and causal multicast algorithms
Proceedings of the 2010 ACM Symposium on Applied Computing
On the inherent cost of atomic broadcast and multicast in wide area networks
ICDCN'08 Proceedings of the 9th international conference on Distributed computing and networking
Practical impact of group communication theory
Future directions in distributed computing
Throughput optimal total order broadcast for cluster environments
ACM Transactions on Computer Systems (TOCS)
Implementing fault-tolerant services using state machines: beyond replication
DISC'10 Proceedings of the 24th international conference on Distributed computing
Immediate detection of predicates in pervasive environments
Proceedings of the 9th International Workshop on Adaptive and Reflective Middleware
RMTP: a reliable multicast transport protocol
INFOCOM'96 Proceedings of the Fifteenth annual joint conference of the IEEE computer and communications societies conference on The conference on computer communications - Volume 3
SIAM Journal on Computing
Decentralized message ordering for publish/subscribe systems
Middleware'06 Proceedings of the 7th ACM/IFIP/USENIX international conference on Middleware
Lightweight causal cluster consistency
IICS'05 Proceedings of the 5th international conference on Innovative Internet Community Systems
Immediate detection of predicates in pervasive environments
Journal of Parallel and Distributed Computing
Group communication: from practice to theory
SOFSEM'06 Proceedings of the 32nd conference on Current Trends in Theory and Practice of Computer Science
Run-time switching between total order algorithms
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Dependable Systems
FTRMI: fault-tolerant transparent RMI
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Research: Communication support for cooperative work
Computer Communications
ACM SIGCOMM: A mechanism for scalable concast communication
Computer Communications
Designing distributed algorithms for mobile computing networks
Computer Communications
An ordered and reliable broadcast protocol for distributed systems
Computer Communications
Research: Design and analysis of an efficient and reliable atomic multicast protocol
Computer Communications
Research: Significantly ordered delivery of messages in group communication
Computer Communications
Research: Supporting fault-tolerant and open distributed processing using RPC
Computer Communications
Opportunistic content sharing applications
Proceedings of the 1st ACM workshop on Emerging Name-Oriented Mobile Networking Design - Architecture, Algorithms, and Applications
Efficient fault-tolerant reliable broadcast in a multi-switch extended LAN
Computer Communications
Transparently increasing RMI fault tolerance
ACM SIGAPP Applied Computing Review
Efficient simulation of view synchrony
Proceedings of the Winter Simulation Conference
Avoiding disruptive failovers in transaction processing systems with multiple active nodes
Journal of Parallel and Distributed Computing
Low-latency multi-datacenter databases using replicated commit
Proceedings of the VLDB Endowment
Hi-index | 0.03 |
The design and correctness of a communication facility for a distributed computer system are reported on. The facility provides support for fault-tolerant process groups in the form of a family of reliable multicast protocols that can be used in both local- and wide-area networks. These protocols attain high levels of concurrency, while respecting application-specific delivery ordering constraints, and have varying cost and performance that depend on the degree of ordering desired. In particular, a protocol that enforces causal delivery orderings is introduced and shown to be a valuable alternative to conventional asynchronous communication protocols. The facility also ensures that the processes belonging to a fault-tolerant process group will observe consistent orderings of events affecting the group as a whole, including process failures, recoveries, migration, and dynamic changes to group properties like member rankings. A review of several uses for the protocols in the ISIS system, which supports fault-tolerant resilient objects and bulletin boards, illustrates the significant simplification of higher level algorithms made possible by our approach.