Distributed deadlock detection algorithm
ACM Transactions on Database Systems (TODS)
Termination Detection of Diffusing Computations in Communicating Sequential Processes
ACM Transactions on Programming Languages and Systems (TOPLAS)
Distributed deadlock detection
ACM Transactions on Computer Systems (TOCS)
Distributed computation on graphs: shortest path algorithms
Communications of the ACM
Time, clocks, and the ordering of events in a distributed system
Communications of the ACM
Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
ACM Transactions on Programming Languages and Systems (TOPLAS)
An example of stepwise refinement of distributed programs: quiescence detection
ACM Transactions on Programming Languages and Systems (TOPLAS) - The MIT Press scientific computation series
Highly available distributed services and fault-tolerant distributed garbage collection
PODC '86 Proceedings of the fifth annual ACM symposium on Principles of distributed computing
Debugging Parallel Programs with Instant Replay
IEEE Transactions on Computers
PARIS: a system for reusing partially interpreted schemas
ICSE '87 Proceedings of the 9th international conference on Software Engineering
Epidemic algorithms for replicated database maintenance
PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
Detecting global termination conditions in the face of uncertainty
PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
Detection of stable properties in distributed applications
PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
Interleaving set temporal logic
PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
Substituting for real time and common knowledge in asynchronous distributed systems
PODC '87 Proceedings of the sixth annual ACM Symposium on Principles of distributed computing
Epidemic algorithms for replicated database maintenance
ACM SIGOPS Operating Systems Review
Deadlock detection in distributed databases
ACM Computing Surveys (CSUR)
Semantics based transaction management techniques for replicated data
SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Debugging concurrent processes: a case study
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Monitoring and performance measuring distributed systems during operation
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Toward a non-atomic era: l-exclusion as a test case
STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Understanding and verifying distributed algorithms using stratified decomposition
PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
The power of multimedia: combining point-to point and multi-access networks
PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
Recovery in distributed systems using asynchronous message logging and checkpointing
PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
Concurrent common knowledge: a new definition of agreement for asynchronous systems
PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
Detecting stable properties of networks in concurrent logic programming languages
PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
On achieving consensus using a shared memory
PODC '88 Proceedings of the seventh annual ACM Symposium on Principles of distributed computing
Reliability mechanisms for ADAMS
C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
A distributed debugger for Amoeba
PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
A graphical representation of concurrent processes
PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
The family of concurrent logic programming languages
ACM Computing Surveys (CSUR)
Efficient distributed recovery using message logging
Proceedings of the eighth annual ACM Symposium on Principles of distributed computing
A compositional approach to superimposition
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Declarative visualization in the shared dataspace paradigm
ICSE '89 Proceedings of the 11th international conference on Software engineering
ACM Computing Surveys (CSUR)
Distributed Checkpointing for Globally Consistent States of Databases
IEEE Transactions on Software Engineering
Knowledge and common knowledge in a distributed environment
Journal of the ACM (JACM)
Fault-tolerant computing based on Mach
ACM SIGOPS Operating Systems Review
Atomic snapshots of shared memory
PODC '90 Proceedings of the ninth annual ACM symposium on Principles of distributed computing
The inhibition spectrum and the achievement of causal consistency
PODC '90 Proceedings of the ninth annual ACM symposium on Principles of distributed computing
Self-stabilizing extensions for message-passing systems
PODC '90 Proceedings of the ninth annual ACM symposium on Principles of distributed computing
Mixed Programming Metaphors in a Shared Dataspace Model of Concurrency
IEEE Transactions on Software Engineering
The use of a synchronizer yields maximum computation rate in distributed networks
STOC '90 Proceedings of the twenty-second annual ACM symposium on Theory of computing
Paradigms for process interaction in distributed programs
ACM Computing Surveys (CSUR)
Replay, recovery, replication, and snapshots of nondeterministic concurrent programs
PODC '91 Proceedings of the tenth annual ACM symposium on Principles of distributed computing
Transparent optimistic rollback recovery
ACM SIGOPS Operating Systems Review
Restoring consistent global states of distributed computations
PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
An approach to reducing delays in recognizing distributed event occurrences
PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
Consistent detection of global predicates
PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
Elements for a course on the design of distributed algorithms
ACM SIGCSE Bulletin
The slide mechanism with applications in dynamic networks
PODC '92 Proceedings of the eleventh annual ACM symposium on Principles of distributed computing
An abstract model of rollback recovery control in distributed systems
ACM SIGOPS Operating Systems Review
Manetho: Transparent Roll Back-Recovery with Low Overhead, Limited Rollback, and Fast Output Commit
IEEE Transactions on Computers - Special issue on fault-tolerant computing
ACM Computing Surveys (CSUR)
Simulating synchronized clocks and common knowledge in distributed systems
Journal of the ACM (JACM)
The derivation of distributed termination detection algorithms from garbage collection schemes
ACM Transactions on Programming Languages and Systems (TOPLAS)
Atomic snapshots of shared memory
Journal of the ACM (JACM)
Causal controversy at Le Mont St.-Michel
ACM SIGOPS Operating Systems Review
Making parallel simulations go fast
WSC '92 Proceedings of the 24th conference on Winter simulation
A superimposition control construct for distributed systems
ACM Transactions on Programming Languages and Systems (TOPLAS)
Adaptive message logging for incremental replay of message-passing programs
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Detecting relational global predicates in distributed systems
PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
Detecting atomic sequences of predicates in distributed computations
PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
The pessimism behind optimistic simulation
PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Reliable and efficient hop-by-hop flow control
SIGCOMM '94 Proceedings of the conference on Communications architectures, protocols and applications
A distributed garbage collector for active objects
OOPSLA '94 Proceedings of the ninth annual conference on Object-oriented programming systems, language, and applications
ENF event predicate detection in distributed systems
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
A checkpoint protocol for an entry consistent shared memory system
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Self-stabilization by counter flushing
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Memory-efficient and self-stabilizing network RESET (extended abstract)
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Uniform actions in asynchronous distributed systems
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
On the memory overhead of distributed snapshots
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Local and temporal predicates in distributed systems
ACM Transactions on Programming Languages and Systems (TOPLAS)
An (N -1)-Resilient Algorithm for Distributed Termination Detection
IEEE Transactions on Parallel and Distributed Systems
Concurrent and Distributed Garbage Collection of Active Objects
IEEE Transactions on Parallel and Distributed Systems
Checkpoint Space Reclamation for Uncoordinated Checkpointing in Message-Passing Systems.
IEEE Transactions on Parallel and Distributed Systems
Testing and Debugging Distributed Programs Using Global Predicates
IEEE Transactions on Software Engineering
Online tracking of mobile users
Journal of the ACM (JACM)
Detection and resolution of deadlocks in distributed database systems
CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
A case for two-level distributed recovery schemes
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
On distributed object checkpointing and recovery
Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
On the relevance of communication costs of rollback-recovery protocols
Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Reasoning about meta level activities in open distributed systems
Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing
Finite termination of asynchronous iterative algorithms
Parallel Computing
Indirect distributed garbage collection: handling object migration
ACM Transactions on Programming Languages and Systems (TOPLAS)
An online computation of critical path profiling
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Debugging race conditions in message-passing programs
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems
IEEE Transactions on Parallel and Distributed Systems
Adaptive recovery for mobile environments
Communications of the ACM
An Architecture for Tolerating Processor Failures in Shared-Memory Multiprocessors
IEEE Transactions on Computers
Detection of Strong Unstable Predicates in Distributed Programs
IEEE Transactions on Parallel and Distributed Systems
Trade-offs in implementing causal message logging protocols
PODC '96 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing
A reliable and scalable striping protocol
Conference proceedings on Applications, technologies, architectures, and protocols for computer communications
Optimistic Crash Recovery without Changing Application Messages
IEEE Transactions on Parallel and Distributed Systems
ACM SIGOPS Operating Systems Review
Distributed termination detection for dynamic systems
Parallel Computing
Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints
IEEE Transactions on Computers
Distributed deadlock detection in Ada run-time environments
TRI-Ada '90 Proceedings of the conference on TRI-ADA '90
An algorithm for message delivery to mobile units
PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
A Survey of Distributed Database Checkpointing
Distributed and Parallel Databases
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Protocols for Integrity Constraint Checking in FederatedDatabases
Distributed and Parallel Databases
A Survey of Recoverable Distributed Shared Virtual Memory Systems
IEEE Transactions on Parallel and Distributed Systems
Progressive Retry for Software Failure Recovery in Message-Passing Applications
IEEE Transactions on Computers
Efficient transparent application recovery in client-server information systems
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A geographically distributed framework for embedded system design and validation
DAC '98 Proceedings of the 35th annual Design Automation Conference
Persistent messages in local transactions
PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
Fault-tolerant distributed simulation
PADS '98 Proceedings of the twelfth workshop on Parallel and distributed simulation
A Case for Two-Level Recovery Schemes
IEEE Transactions on Computers
Webs of Archived Distributed Computations for Asynchronous Collaboration
The Journal of Supercomputing - Special issue: high performance distributed computing
Efficient and flexible fault tolerance and migration of scientific simulations using CUMULVS
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
IEEE Transactions on Parallel and Distributed Systems
Critical Path Profiling of Message Passing and Shared-Memory Programs
IEEE Transactions on Parallel and Distributed Systems
On Coordinated Checkpointing in Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
An Index-Based Checkpointing Algorithm for Autonomous Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Transparent adaptive parallelism on NOWs using OpenMP
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Rollback-dependency trackability: visible characterizations
Proceedings of the eighteenth annual ACM symposium on Principles of distributed computing
Optimism: not just for event execution anymore
PADS '99 Proceedings of the thirteenth workshop on Parallel and distributed simulation
SFT: a consistent checkpointing algorithm with shorter freezing time
ACM SIGOPS Operating Systems Review
Algorithm development in the mobile environment
Proceedings of the 21st international conference on Software engineering
Fault-tolerant distributed simulation
WSC '91 Proceedings of the 23rd conference on Winter simulation
Event-Based Techniques to Debug an Object Request Broker
The Journal of Supercomputing
Staggered Consistent Checkpointing
IEEE Transactions on Parallel and Distributed Systems
Communication-Induced Determination of Consistent Snapshots
IEEE Transactions on Parallel and Distributed Systems
A module on distributed systems for the operating systems course
SIGCSE '90 Proceedings of the twenty-first SIGCSE technical symposium on Computer science education
Checkpointing and rollback-recovery for distributed systems
ACM '86 Proceedings of 1986 ACM Fall joint computer conference
An architecture for packet-striping protocols
ACM Transactions on Computer Systems (TOCS)
The Journal of Supercomputing
A Low Overhead Logging Scheme for Fast Recovery in Distributed Shared Memory Systems
The Journal of Supercomputing
Debugging distributed programs using controlled re-execution
Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
Mutable Checkpoints: A New Checkpointing Approach for Mobile Computing Systems
IEEE Transactions on Parallel and Distributed Systems
Increasing the confidence in off-the-shelf components: a software connector-based approach
SSR '01 Proceedings of the 2001 symposium on Software reusability: putting software reuse in context
The concurrency hierarchy, and algorithms for unbounded concurrency
Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
Techniques to Tackle State Explosion in Global Predicate Detection
IEEE Transactions on Software Engineering
Transparent optimistic rollback recovery
EW 4 Proceedings of the 4th workshop on ACM SIGOPS European workshop
Causality in distributed systems
EW 5 Proceedings of the 5th workshop on ACM SIGOPS European workshop: Models and paradigms for distributed systems structuring
Distributed Predicate Detection in Series-Parallel Systems
IEEE Transactions on Parallel and Distributed Systems
Highly efficient gang scheduling implementation
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A checkpoint-based high availability run-time system for Windows NT clusters
ACM SIGOPS Operating Systems Review
Using passive object garbage collection algorithms for garbage collection of active objects
Proceedings of the 3rd international symposium on Memory management
A Formal Specification and Verification Framework for Time Warp-Based Parallel Simulation
IEEE Transactions on Software Engineering
Tracking immediate predecessors in distributed computations
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
A Roll-Forward Recovery Scheme for Solving the Problem of Coasting Forward for Distributed Systems
ACM SIGOPS Operating Systems Review
Logical Clock Requirements for Reverse Engineering Scenarios from a Distributed System
IEEE Transactions on Software Engineering
A Distributed Parallel Programming Framework
IEEE Transactions on Software Engineering
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Proceedings of the 8th annual international conference on Mobile computing and networking
On-the-fly calculation and verification of consistent steering transactions
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Undo as concurrent inverse in group editors
ACM Transactions on Computer-Human Interaction (TOCHI)
Efficient Garbage Collection Schemes for Causal Message Logging with Independent Checkpointing
The Journal of Supercomputing
Triggered message sequence charts
Proceedings of the 10th ACM SIGSOFT symposium on Foundations of software engineering
Concurrent single stepping in event-visualization tools
Cluster Computing
Optimal Distributed Arc-Consistency
Constraints
Triggered message sequence charts
ACM SIGSOFT Software Engineering Notes
Adaptive Message Logging for Incremental Program Replay
IEEE Parallel & Distributed Technology: Systems & Technology
Bounded and Minimum Global Snapshots
IEEE Parallel & Distributed Technology: Systems & Technology
ickp: A Consistent Checkpointer for Multicomputers
IEEE Parallel & Distributed Technology: Systems & Technology
Methods for Observing Global Properties in Distributed Systems
IEEE Parallel & Distributed Technology: Systems & Technology
A Framework for Distributed Debugging
IEEE Software
Reliability Through Consistency
IEEE Software
Nest: A Nested-Predicate Scheme for Fault Tolerance
IEEE Transactions on Computers
IEEE Transactions on Computers
An Adaptive Checkpointing Scheme for Distributed Databases with Mixed Types of Transactions
IEEE Transactions on Knowledge and Data Engineering
Development of a Class of Distributed Termination Detection Algorithms
IEEE Transactions on Knowledge and Data Engineering
The Distributed Constraint Satisfaction Problem: Formalization and Algorithms
IEEE Transactions on Knowledge and Data Engineering
Rollback Recovery in Distributed Systems Using Loosely Synchronized Clocks
IEEE Transactions on Parallel and Distributed Systems
Checkpointing for Distributed Databases: Starting from the Basics
IEEE Transactions on Parallel and Distributed Systems
An Implementation of F-Channels
IEEE Transactions on Parallel and Distributed Systems
An Efficient Protocol for Checkpointing Recovery in Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Detection of Weak Unstable Predicates in Distributed Programs
IEEE Transactions on Parallel and Distributed Systems
Repeated Computation of Global Functions in a Distributed Environment
IEEE Transactions on Parallel and Distributed Systems
Low-Latency, Concurrent Checkpointing for Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Efficient Rollback-Recovery Technique in Distributed Computing Systems
IEEE Transactions on Parallel and Distributed Systems
Finding Consistent Global Checkpoints in a Distributed Computation
IEEE Transactions on Parallel and Distributed Systems
Proof Rules for Flush Channels
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering
Efficient Detection and Resolution of Generalized Distributed Deadlocks
IEEE Transactions on Software Engineering
Consistency Issues in Distributed Checkpoints
IEEE Transactions on Software Engineering
An Efficient Distributed Online Algorithm to Detect Strong Conjunctive Predicates
IEEE Transactions on Software Engineering
Checkpointing with mutable checkpoints
Theoretical Computer Science - Dependable computing
Bounded time-stamping in message-passing systems
Theoretical Computer Science
Journal of Parallel and Distributed Computing - Self-stabilizing distributed systems
Interval consistency of asynchronous distributed computations
Journal of Computer and System Sciences
Perfect Failure Detection in Timed Asynchronous Systems
IEEE Transactions on Computers
An Experimental Evaluation of Coordinated Checkpointing in a Parallel Machine
EDCC-3 Proceedings of the Third European Dependable Computing Conference on Dependable Computing
Detection of Orthogonal Interval Relations
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Performance Evaluation of Fault Tolerance for Parallel Applications in Networked Environments
ICPP '97 Proceedings of the international Conference on Parallel Processing
CoCheck: Checkpointing and Process Migration for MPI
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Interactive Visual Exploration of Distributed Computations
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Detecting Temporal Logic Predicates on the Happened-Before Model
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Checkpointing and Rollback of Wide-area Distributed Applications using Mobile Agents
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
FANTOMAS: Fault Tolerance for Mobile Agents in Clusters
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
QoS based Checkpoint Protocol in Multimedia Network Systems
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Concurrent Reading and Writing with Mobile Agents
IWDC '02 Proceedings of the 4th International Workshop on Distributed Computing, Mobile and Wireless Computing
Two Epoch Algorithms for Disaster Recovery
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
On the Complexity of the Minimum and Maximum Global Snapshot Problems
COMPSAC '97 Proceedings of the 21st International Computer Software and Applications Conference
Computation Slicing: Techniques and Theory
DISC '01 Proceedings of the 15th International Conference on Distributed Computing
Guaranteed Mutually Consistent Checkpointing in Distributed Computations
ASIAN '98 Proceedings of the 4th Asian Computing Science Conference on Advances in Computing Science
Distributed Checkpointing on Clusters with Dynamic Striping and Staggering
ASIAN '02 Proceedings of the7th Asian Computing Science Conference on Advances in Computing Science: Internet Computing and Modeling, Grid Computing, Peer-to-Peer Computing, and Cluster
Shortcut Replay: A Replay Technique for Debugging Long-Running Parallel Programs
ASIAN '02 Proceedings of the7th Asian Computing Science Conference on Advances in Computing Science: Internet Computing and Modeling, Grid Computing, Peer-to-Peer Computing, and Cluster
An Efficient Coordinated Checkpointing Scheme Based on PWD Model
ICOIN '02 Revised Papers from the International Conference on Information Networking, Wireless Communications Technologies and Network Applications-Part II
A Hybrid Fault-Tolerant Scheme Based on Checkpointing in MASs
ICOIN '02 Revised Papers from the International Conference on Information Networking, Wireless Communications Technologies and Network Applications-Part II
A Structural Embedding of Ocsid in PVS
TPHOLs '01 Proceedings of the 14th International Conference on Theorem Proving in Higher Order Logics
Instant Image: Transitive and Cyclical Snapshots in Distributed Storage Volumes
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Universal Constructs in Distributed Computations
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Agents, Distributed Algorithms, and Stabilization
COCOON '00 Proceedings of the 6th Annual International Conference on Computing and Combinatorics
Distributed Configuration as Distributed Dynamic Constraint Satisfaction
Proceedings of the 14th International conference on Industrial and engineering applications of artificial intelligence and expert systems: engineering of intelligent systems
Checkpoint-Recovery for Mobile Intelligent Networks
Proceedings of the 14th International conference on Industrial and engineering applications of artificial intelligence and expert systems: engineering of intelligent systems
Keeping Track of the Latest Gossip in Shared Memory Systems
FST TCS 2000 Proceedings of the 20th Conference on Foundations of Software Technology and Theoretical Computer Science
Concurrent Knowledge and Logical Clock Abstractions
FST TCS 2000 Proceedings of the 20th Conference on Foundations of Software Technology and Theoretical Computer Science
Distributed Reinforcement of Arc-Consistency
PRICAI '02 Proceedings of the 7th Pacific Rim International Conference on Artificial Intelligence: Trends in Artificial Intelligence
Design Evolution of the EROS Single-Level Store
ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Algorithm Visualization For Distributed Environments
INFOVIS '98 Proceedings of the 1998 IEEE Symposium on Information Visualization
(Im)Possibilities of Predicate Detection in Crash-Affected Systems
WSS '01 Proceedings of the 5th International Workshop on Self-Stabilizing Systems
Recent Advances in Distributed Garbage Collection
Advances in Distributed Systems, Advanced Distributed Computing: From Algorithms to Systems
Termination Detection of Distributed Algorithms by Graph Relabelling Systems
ICGT '02 Proceedings of the First International Conference on Graph Transformation
Mechanizing Proofs of Computation Equivalence
CAV '99 Proceedings of the 11th International Conference on Computer Aided Verification
A Fault-Tolerant Scheme of Multi-agent System for Worker Agents
AMT '01 Proceedings of the 6th International Computer Science Conference on Active Media Technology
Synergistic Coordination between Software and Hardware Fault Tolerance Techniques
DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Extending PVM with Consistent Cut Capabilities: Application Aspects and Implementation Strategies
Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
QoS-Based Checkpoint Protocol for Multimedia Network Systems
PCM '01 Proceedings of the Second IEEE Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Protocol for Taking Object-Based Checkpoints
DEXA '00 Proceedings of the 11th International Conference on Database and Expert Systems Applications
The VLDB Journal — The International Journal on Very Large Data Bases
An Efficient Optimistic Message Logging Scheme for Recoverable Mobile Computing Systems
IEEE Transactions on Mobile Computing
Automated application-level checkpointing of MPI programs
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Single stepping in event-visualization tools
CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
Collective operations in application-level fault-tolerant MPI
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Debugging in a Distributed World: Observation and Control
ASSET '98 Proceedings of the 1998 IEEE Workshop on Application - Specific Software Engineering and Technology
A Fair Fast Distributed Concurrent-Reader Exclusive-Writer Synchronization
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Supporting fault-tolerance in heterogeneous distributed applications
HCW '97 Proceedings of the 6th Heterogeneous Computing Workshop (HCW '97)
A world-wide distributed system using Java and the Internet
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Concurrent rollback for crash recovery in extended hypercube networks
PAS '95 Proceedings of the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis
Minimizing timestamp size for completely asynchronous optimistic recovery with minimal rollback
SRDS '96 Proceedings of the 15th Symposium on Reliable Distributed Systems
SRDS '96 Proceedings of the 15th Symposium on Reliable Distributed Systems
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Optimistic Recovery in Multi-Threaded Distributed Systems
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Object-Based Checkpoints in Distributed Systems
WORDS '97 Proceedings of the 3rd Workshop on Object-Oriented Real-Time Dependable Systems - (WORDS '97)
Checkpoint and Rollback in Asynchronous Distributed Systems
INFOCOM '97 Proceedings of the INFOCOM '97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Driving the Information Revolution
User-Triggered Checkpointing: System-Independent and Scalable Application Recovery
ISCC '97 Proceedings of the 2nd IEEE Symposium on Computers and Communications (ISCC '97)
Termination detection in data-driven parallel computations/applications
Journal of Parallel and Distributed Computing
Evaluating Distributed Checkpointing Protocol
ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
User-level checkpointing through exportable kernel state
IWOOOS '96 Proceedings of the 5th International Workshop on Object Orientation in Operating Systems (IWOOOS '96)
An Exercise in Formal Reasoning about Mobile Communications
IWSSD '98 Proceedings of the 9th international workshop on Software specification and design
A Mechanized Proof Environment for the Convenient Computations Proof Method
Formal Methods in System Design
Error detection in large-scale parallel programs with long runtimes
Future Generation Computer Systems - Tools for program development and analysis
Algorithm-Based Diskless Checkpointing for Fault-Tolerant Matrix Operations
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Completely Asynchronous Optimistic Recovery with Minimal Rollbacks
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Checkpointing and Its Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Fault Tolerance for Off-the-Shelf Applications and Hardware
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
On Detecting Global Predicates in Distributed Computations
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Self-Stabilizing PIF Algorithm in Arbitrary Rooted Networks
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Design and Implementation of a Composable Reflective Middleware Framework
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
A Protocol Design of Communication State Transfer for Distributed Computing
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
On Slicing a Distributed Computation
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Enforcing Perfect Failure Detection
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Predicate Control for Active Debugging of Distributed Programs
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
An algorithm for Supporting Fault Tolerant Objects in Distributed Object-Oriented Operating Systems
IWOOOS '95 Proceedings of the 4th International Workshop on Object-Orientation in Operating Systems
Checkpointing and Recovery for Distributed Shared Memory Applications
IWOOOS '95 Proceedings of the 4th International Workshop on Object-Orientation in Operating Systems
A Fine-Grained Modality Classification for Global Predicates
IEEE Transactions on Parallel and Distributed Systems
On Properties of RDT Communication-Induced Checkpointing Protocols
IEEE Transactions on Parallel and Distributed Systems
ACM SIGACT News distributed computing column 12
ACM SIGACT News
Granularity-Driven Dynamic Predicate Slicing Algorithms for Message Passing Systems
Automated Software Engineering
Distributed recovery with K-optimistic logging
Journal of Parallel and Distributed Computing
Causality tracking in causal message-logging protocols
Distributed Computing
Action systems in incremental and aspect-oriented modeling
Distributed Computing - Papers in celebration of the 20th anniversary of PODC
On designing direct dependency: based fast recovery algorithms for distributed systems
ACM SIGOPS Operating Systems Review
Finding a Recovery Line in Uncoordinated Checkpointing
ICDCSW '04 Proceedings of the 24th International Conference on Distributed Computing Systems Workshops - W7: EC (ICDCSW'04) - Volume 7
Predicate control: synchronization in distributed computations with look-ahead
Journal of Parallel and Distributed Computing
Energy-aware deterministic fault tolerance in distributed real-time embedded systems
Proceedings of the 41st annual Design Automation Conference
A Global-State-Triggered Fault Injector for Distributed System Evaluation
IEEE Transactions on Parallel and Distributed Systems
Quantifying rollback propagation in distributed checkpointing
Journal of Parallel and Distributed Computing
A causal message logging protocol for mobile nodes in mobile computing systems
Future Generation Computer Systems - Special issue: Advanced services for clusters and internet computing
Communication State Transfer for the Mobility of Concurrent Heterogeneous Computing
IEEE Transactions on Computers
Fast, Centralized Detection and Resolution of Distributed Deadlocks in the Generalized Model
IEEE Transactions on Software Engineering
Agent-Based Approach to Dynamic Meeting Scheduling Problems
AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
Concurrent checkpoint initiation and recovery algorithms on asynchronous ring networks
Journal of Parallel and Distributed Computing
Application-level checkpointing for shared memory programs
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
A link between knowledge and communication in faulty distributed systems
TARK '90 Proceedings of the 3rd conference on Theoretical aspects of reasoning about knowledge
A knowledge theoretic account of recovery in distributed systems: the case of negotiated commitment
TARK '88 Proceedings of the 2nd conference on Theoretical aspects of reasoning about knowledge
Checkpointing for Peta-Scale Systems: A Look into the Future of Practical Rollback-Recovery
IEEE Transactions on Dependable and Secure Computing
PDB: Pervasive Debugging With Xen
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
Checkpoint and Restart for Distributed Components in XCAT3
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
A Termination Detection Protocol for Use in Mobile Ad Hoc Networks
Automated Software Engineering
Communication-based prevention of useless checkpoints in distributed computations
Distributed Computing
Constraint-based structuring of network protocols
Distributed Computing
Detection of global predicates: techniques and their limitations
Distributed Computing
Extensible, Scalable Monitoring for Clusters of Computers
LISA '97 Proceedings of the 11th USENIX conference on System administration
The power of logical clock abstractions
Distributed Computing
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 1 - Volume 02
Optimizing Checkpoint Sizes in the C3 System
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
A novel min-process checkpointing scheme for mobile computing systems
Journal of Systems Architecture: the EUROMICRO Journal
Safety assurance via on-line monitoring
Distributed Computing
Synchronous, asynchronous, and causally ordered communication
Distributed Computing
Efficient detection of a class of stable properties
Distributed Computing
Strong stable properties in distributed systems
Distributed Computing
Efficient algorithms for optimistic crash recovery
Distributed Computing
Concurrent common knowledge: defining agreement for asynchronous systems
Distributed Computing
Verification of distributed programs using representative interleaving sequences
Distributed Computing
µsik " A Micro-Kernel for Parallel/Distributed Simulation Systems
Proceedings of the 19th Workshop on Principles of Advanced and Distributed Simulation
Self-stabilizing extensions for message-passing systems
Distributed Computing - Special issue: Self-stabilization
The inhibition spectrum and the achievement of causal consistency
Distributed Computing
Detecting causal relationships in distributed computations: in search of the holy grail
Distributed Computing
Intractability results in predicate detection
Information Processing Letters
On deadlocks of exclusive AND-requests for resources
Distributed Computing
Fault tolerance for internet agent systems: in cases of stop failure and byzantine failure
Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
On the design of a pervasive debugger
Proceedings of the sixth international symposium on Automated analysis-driven debugging
A channel memory based fault tolerance for MPI applications
Future Generation Computer Systems - Special issue: Parallel computing technologies
Causality-Based Predicate Detection across Space and Time
IEEE Transactions on Computers
Using Consistent Global Checkpoints to Synchronize Processes in Distributed Simulation
DS-RT '05 Proceedings of the 9th IEEE International Symposium on Distributed Simulation and Real-Time Applications
Event-based Programming Models for Event-based Programming Models for
DS-RT '05 Proceedings of the 9th IEEE International Symposium on Distributed Simulation and Real-Time Applications
An Efficient Index-Based Checkpointing Protocol with Constant-Size Control Information on Messages
IEEE Transactions on Dependable and Secure Computing
A visual environment for distributed simulation systems
ACM SIGSIM Simulation Digest
Asynchronous backtracking without adding links: a new member in the ABT family
Artificial Intelligence - Special issue: Distributed constraint satisfaction
Asynchronous aggregation and consistency in distributed constraint satisfaction
Artificial Intelligence - Special issue: Distributed constraint satisfaction
Meetings scheduling solver enhancement with local consistency reinforcement
Applied Intelligence
Performance evaluation of automatic checkpoint-based fault tolerance for AMPI and Charm++
ACM SIGOPS Operating Systems Review
Performance analysis of different checkpointing and recovery schemes using stochastic model
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing
Finding a suitable checkpoint and recovery protocol for a distributed application
Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
Fast batched data transfer with flush channels: A performance analysis
Journal of Parallel and Distributed Computing
Techniques and applications of computation slicing
Distributed Computing
Manufacturing opaque predicates in distributed systems for code obfuscation
ACSC '06 Proceedings of the 29th Australasian Computer Science Conference - Volume 48
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Cyclic Storage for Fault-Tolerant Distributed Executions
IEEE Transactions on Parallel and Distributed Systems
Detecting and Isolating Malicious Routers
IEEE Transactions on Dependable and Secure Computing
Safety and consistency in policy-based authorization systems
Proceedings of the 13th ACM conference on Computer and communications security
Experimental evaluation of application-level checkpointing for OpenMP programs
Proceedings of the 20th annual international conference on Supercomputing
Scalable algorithms for global snapshots in distributed systems
Proceedings of the 20th annual international conference on Supercomputing
Realizing the e-science desktop peer using a peer-to-peer distributed virtual machine middleware
Proceedings of the 4th international workshop on Middleware for grid computing
Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant MPI
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Using queries for distributed monitoring and forensics
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Declarative failure recovery for sensor networks
Proceedings of the 6th international conference on Aspect-oriented software development
Quasi-atomic recovery for distributed agents
Parallel Computing
Efficient detection of a locally stable predicate in a distributed system
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing
Peer-to-Peer and fault-tolerance: Towards deployment-based technical services
Future Generation Computer Systems
Exploring failure transparency and the limits of generic recovery
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Formal Verification of Simulation Traces Using Computation Slicing
IEEE Transactions on Computers
On the Complexity of Removing Z-Cycles from a Checkpoints and Communication Pattern
IEEE Transactions on Computers
Detecting Arbitrary Stable Properties Using Efficient Snapshots
IEEE Transactions on Software Engineering
Self-stabilizing algorithm for checkpointing in a distributed system
Journal of Parallel and Distributed Computing
Lightweight cnsistency enforcement schemes for distributed proofs with hidden subtrees
Proceedings of the 12th ACM symposium on Access control models and technologies
Object caching in a CORBA compliant system
COOTS'96 Proceedings of the 2nd conference on USENIX Conference on Object-Oriented Technologies (COOTS) - Volume 2
Transparent fault tolerance for parallel applications on networks of workstations
ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Testing Dynamic Adaptation in Distributed Systems
AST '07 Proceedings of the Second International Workshop on Automation of Software Test
An agent-based approach to solve dynamic meeting scheduling problems with preferences
Engineering Applications of Artificial Intelligence
An efficient delay-optimal distributed termination detection algorithm
Journal of Parallel and Distributed Computing
Modeling and design of fault-tolerant and self-adaptive reconfigurable networked embedded systems
EURASIP Journal on Embedded Systems
An enhanced model-based checkpointing protocol
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Transactions with isolation and cooperation
Proceedings of the 22nd annual ACM SIGPLAN conference on Object-oriented programming systems and applications
Temporal Predicate Detection Using Synchronized Clocks
IEEE Transactions on Computers
Solving Computation Slicing Using Predicate Detection
IEEE Transactions on Parallel and Distributed Systems
Towards distributed service provisioning
Proceedings of the 6th international conference on Mobile and ubiquitous multimedia
DS-RT '07 Proceedings of the 11th IEEE International Symposium on Distributed Simulation and Real-Time Applications
Distributed Watchpoints: Debugging Large Modular Robot Systems
International Journal of Robotics Research
Model-based performance evaluation of distributed checkpointing protocols
Performance Evaluation
A synchronous checkpointing protocol for mobile distributed systems: probabilistic approach
International Journal of Information and Computer Security
Coordinated checkpoint versus message log for fault tolerant MPI
International Journal of High Performance Computing and Networking
Data sharing vs. message passing: synergy or incompatibility?: an implementation-driven case study
Proceedings of the 2008 ACM symposium on Applied computing
Transparent checkpoint-restart of multiple processes on commodity operating systems
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Fundamenta Informaticae - This is a SPECIAL ISSUE ON ASM'05
On termination detection in crash-prone distributed systems with failure detectors
Journal of Parallel and Distributed Computing
Data-stream-based global event monitoring using pairwise interactions
Journal of Parallel and Distributed Computing
Tracking in a spaghetti bowl: monitoring transactions using footprints
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A low-cost hybrid coordinated checkpointing protocol for mobile distributed systems
Mobile Information Systems
Communication analysis of distributed programs
Scientific Programming - Parallel/High-Performance Object-Oriented Scientific Computing (POOSC '05), Glasgow, UK, 25 July 2005
A quasi-synchronous checkpointing algorithm that prevents contention for stable storage
Information Sciences: an International Journal
A new class of nature-inspired algorithms for self-adaptive peer-to-peer computing
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Applying static network protocols to dynamic networks
SFCS '87 Proceedings of the 28th Annual Symposium on Foundations of Computer Science
A quasi-synchronous checkpointing algorithm that prevents contention for stable storage
Information Sciences: an International Journal
Consensus routing: the internet as a distributed system
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
Optimal maintenance of a spanning tree
Journal of the ACM (JACM)
2-step algorithm for enhancing effectiveness of sender-based message logging
SpringSim '07 Proceedings of the 2007 spring simulation multiconference - Volume 2
Taking snapshots of virtual networked environments
VTDC '07 Proceedings of the 2nd international workshop on Virtualization technology in distributed computing
Testing Distributed Systems Through Symbolic Model Checking
FORTE '07 Proceedings of the 27th IFIP WG 6.1 international conference on Formal Techniques for Networked and Distributed Systems
ModHel'X: A Component-Oriented Approach to Multi-Formalism Modeling
Models in Software Engineering
Distributed Semantics and Implementation for Systems with Interaction and Priority
FORTE '08 Proceedings of the 28th IFIP WG 6.1 international conference on Formal Techniques for Networked and Distributed Systems
Enforcing Safety and Consistency Constraints in Policy-Based Authorization Systems
ACM Transactions on Information and System Security (TISSEC)
WSEAS Transactions on Computers
Journal of Parallel and Distributed Computing
Empire of colonies: Self-stabilizing and self-organizing distributed algorithm
Theoretical Computer Science
Sensornet Checkpointing: Enabling Repeatability in Testbeds and Realism in Simulations
EWSN '09 Proceedings of the 6th European Conference on Wireless Sensor Networks
Transparent checkpoints of closed distributed systems in Emulab
Proceedings of the 4th ACM European conference on Computer systems
Computer Networks: The International Journal of Computer and Telecommunications Networking
Interconnect agnostic checkpoint/restart in open MPI
Proceedings of the 18th ACM international symposium on High performance distributed computing
CrystalBall: predicting and preventing inconsistencies in deployed distributed systems
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
NetReview: detecting when interdomain routing goes wrong
NSDI'09 Proceedings of the 6th USENIX symposium on Networked systems design and implementation
Distributed Log-based Reconciliation
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
International Journal of High Performance Computing Applications
Transparent parallel checkpointing and migration in clusters and ClusterGrids
International Journal of Computational Science and Engineering
A Snapshot Algorithm for Mobile Ad Hoc Networks
IWANN '09 Proceedings of the 10th International Work-Conference on Artificial Neural Networks: Part II: Distributed Computing, Artificial Intelligence, Bioinformatics, Soft Computing, and Ambient Assisted Living
Measurement and modeling of a large-scale overlay for multimedia streaming
The Fourth International Conference on Heterogeneous Networking for Quality, Reliability, Security and Robustness & Workshops
Brief announcement: virtual world consistency: a new condition for STM systems
Proceedings of the 28th ACM symposium on Principles of distributed computing
A novel low-overhead recovery approach for distributed systems
Journal of Computer Systems, Networks, and Communications
Demo abstract: Sensornet checkpointing between simulated and deployed networks
IPSN '09 Proceedings of the 2009 International Conference on Information Processing in Sensor Networks
Locally Distributed Predicates: A Programming Facility for Distributed State Detection
ICLP '09 Proceedings of the 25th International Conference on Logic Programming
Efficient model checking for LTL with partial order snapshots
Theoretical Computer Science
International Journal of High Performance Computing Applications
An autonomous agent approach to query optimization in stream grids
Proceedings of the International Conference on Management of Emergent Digital EcoSystems
Macrodebugging: global views of distributed program execution
Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems
Asynchronous backtracking without adding links: a new member in the ABT family
Artificial Intelligence - Special issue: Distributed constraint satisfaction
Asynchronous aggregation and consistency in distributed constraint satisfaction
Artificial Intelligence - Special issue: Distributed constraint satisfaction
Scalable temporal order analysis for large scale debugging
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Application and middleware transparent checkpointing with TCKPT on ClusterGrids
Future Generation Computer Systems
SBDO: A New Robust Approach to Dynamic Distributed Constraint Optimisation
PRIMA '09 Proceedings of the 12th International Conference on Principles of Practice in Multi-Agent Systems
A Channel Memory based fault tolerance for MPI applications
Future Generation Computer Systems - Special issue: Parallel computing technologies
Intractability results in predicate detection
Information Processing Letters
Predicting and preventing inconsistencies in deployed distributed systems
ACM Transactions on Computer Systems (TOCS)
A tale of two planners: modular robotic planning with LDP
IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
A weighted checkpointing protocol for mobile distributed systems
International Journal of Ad Hoc and Ubiquitous Computing
'Conceptual distance' and interface-supported visualization of information objects and patterns
Journal of Visual Languages and Computing
SSS'06 Proceedings of the 8th international conference on Stabilization, safety, and security of distributed systems
An automata-based approach to property testing in event traces
TestCom'03 Proceedings of the 15th IFIP international conference on Testing of communicating systems
VECPAR'02 Proceedings of the 5th international conference on High performance computing for computational science
Improving dependability of component-based systems via multi-versioning connectors
Architecting dependable systems
Parametric and sliced causality
CAV'07 Proceedings of the 19th international conference on Computer aided verification
Distributed forward checking may lie for privacy
CSCLP'06 Proceedings of the constraint solving and contraint logic programming 11th annual ERCIM international conference on Recent advances in constraints
Distance sensitive snapshots in wireless sensor networks
OPODIS'07 Proceedings of the 11th international conference on Principles of distributed systems
Asynchronous inter-level forward-checking for DisCSPs
CP'09 Proceedings of the 15th international conference on Principles and practice of constraint programming
Help when needed, but no more: efficient read/write partial snapshot
DISC'09 Proceedings of the 23rd international conference on Distributed computing
Co-ordination in artificial agent societies: social structures and its implications for autonomous problem-solving agents
A general method to make multi-clock system deterministic
Proceedings of the Conference on Design, Automation and Test in Europe
A flexible checkpoint/restart model in distributed systems
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
CONCUR'10 Proceedings of the 21st international conference on Concurrency theory
Designing execution control in programs with global application states monitoring
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part II
Checkpoint/restart-enabled parallel debugging
EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface
Recent advances in checkpoint/recovery systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Plan switching: an approach to plan execution in changing environments
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Piccolo: building fast, distributed programs with partitioned tables
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Modeling and analyzing periodic distributed computations
SSS'10 Proceedings of the 12th international conference on Stabilization, safety, and security of distributed systems
Safe flocking in spite of actuator faults
SSS'10 Proceedings of the 12th international conference on Stabilization, safety, and security of distributed systems
Aspect-oriented checkpointing approach of composed web services
ICWE'10 Proceedings of the 10th international conference on Current trends in web engineering
Self-stabilizing Byzantine asynchronous unison
OPODIS'10 Proceedings of the 14th international conference on Principles of distributed systems
International Journal of Communication Networks and Distributed Systems
Reliable distributed data stream management in mobile environments
Information Systems
VMCAI'11 Proceedings of the 12th international conference on Verification, model checking, and abstract interpretation
Revisiting and improving a result on integrity preservation by concurrent transactions
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems
Macro and micro context-awareness for autonomic pervasive computing
Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services
Detecting Locally Distributed Predicates
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
A hybrid fault tolerance technique in grid computing system
The Journal of Supercomputing
Hybrid checkpointing using emerging nonvolatile memories for future exascale systems
ACM Transactions on Architecture and Code Optimization (TACO)
New & efficient low overheads algorithm for mobile distributed systems
Proceedings of the International Conference & Workshop on Emerging Trends in Technology
New & efficient low overheads algorithm for mobile distributed systems
Proceedings of the International Conference & Workshop on Emerging Trends in Technology
Concurrency among strangers: programming in E as plan coordination
TGC'05 Proceedings of the 1st international conference on Trustworthy global computing
Fast checkpoint recovery algorithms for frequently consistent applications
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Trebuchet: exploring TLP with dataflow virtualisation
International Journal of High Performance Systems Architecture
Toward generating reducible replay logs
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Distributed and Parallel Databases
Boosting distributed constraint satisfaction
Journal of Heuristics
ScatterD: Spatial deployment optimization with hybrid heuristic/evolutionary algorithms
ACM Transactions on Autonomous and Adaptive Systems (TAAS)
Monitoring distributed systems using knowledge
FMOODS'11/FORTE'11 Proceedings of the joint 13th IFIP WG 6.1 and 30th IFIP WG 6.1 international conference on Formal techniques for distributed systems
Correlated set coordination in fault tolerant message logging protocols
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Distributed constraint programming with agents
ICAIS'11 Proceedings of the Second international conference on Adaptive and intelligent systems
SSS'11 Proceedings of the 13th international conference on Stabilization, safety, and security of distributed systems
Help when needed, but no more: Efficient read/write partial snapshot
Journal of Parallel and Distributed Computing
A global snapshot collection algorithm with concurrent initiators with non-FIFO channel
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
Distributed implementation of systems with multiparty interactions and priorities
SEFM'11 Proceedings of the 9th international conference on Software engineering and formal methods
Parallel solution of the obstacle problem in Grid environments
International Journal of High Performance Computing Applications
A proxy based efficient checkpointing scheme for fault recovery in mobile grid system
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Empire of Colonies: self-stabilizing and self-organizing distributed algorithms
OPODIS'06 Proceedings of the 10th international conference on Principles of Distributed Systems
Dynamic virtual clustering with xen and moab
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
ICDCN'06 Proceedings of the 8th international conference on Distributed Computing and Networking
Checkpointing and communication pattern-neutral algorithm for removing messages logged by senders
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Computational efficiency and practical implications for a client grid
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
An asynchronous recovery algorithm based on a staggered quasi-synchronous checkpointing algorithm
IWDC'05 Proceedings of the 7th international conference on Distributed Computing
Self-stabilizing checkpointing algorithm in ring topology
IWDC'05 Proceedings of the 7th international conference on Distributed Computing
Self-refined fault tolerance in HPC using dynamic dependent process groups
IWDC'05 Proceedings of the 7th international conference on Distributed Computing
Requirements for secure logging of decentralized cross-organizational workflow executions
OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems
Self-stabilization of byzantine protocols
SSS'05 Proceedings of the 7th international conference on Self-Stabilizing Systems
The generalized deadlock resolution problem
ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Immediate detection of predicates in pervasive environments
Journal of Parallel and Distributed Computing
Predicate detection using event streams in ubiquitous environments
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Global state detection based on peer-to-peer interactions
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Nonintrusive snapshots using thin slices
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
A fault-tolerant multi-agent development framework
ISPA'04 Proceedings of the Second international conference on Parallel and Distributed Processing and Applications
Monitoring stable properties in dynamic peer-to-peer distributed systems
FSTTCS '05 Proceedings of the 25th international conference on Foundations of Software Technology and Theoretical Computer Science
A checkpoint/recovery model for heterogeneous dataflow computations using work-stealing
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
TACAS'05 Proceedings of the 11th international conference on Tools and Algorithms for the Construction and Analysis of Systems
Implementing rollback-recovery coordinated checkpoints
ISSADS'05 Proceedings of the 5th international conference on Advanced Distributed Systems
Performance evaluation of consistent recovery protocols using MPICH-GF
EDCC'05 Proceedings of the 5th European conference on Dependable Computing
Transparent fault tolerance for grid applications
EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
Solving collaborative fuzzy agents problems with CLP(FD)
PADL'05 Proceedings of the 7th international conference on Practical Aspects of Declarative Languages
Efficient reduction for wait-free termination detection in a crash-prone distributed system
DISC'05 Proceedings of the 19th international conference on Distributed Computing
Plausible clocks with bounded inaccuracy
DISC'05 Proceedings of the 19th international conference on Distributed Computing
A model for detecting "global footprint anomalies" in a grid environment
PAISI'10 Proceedings of the 2010 Pacific Asia conference on Intelligence and Security Informatics
Stable predicate detection in dynamic systems
OPODIS'05 Proceedings of the 9th international conference on Principles of Distributed Systems
FM'06 Proceedings of the 14th international conference on Formal Methods
Rigorous fault tolerance using aspects and formal methods
Rigorous Development of Complex Fault-Tolerant Systems
Distributed garbage collection for mobile actor systems: the pseudo root approach
GPC'06 Proceedings of the First international conference on Advances in Grid and Pervasive Computing
MadLINQ: large-scale distributed matrix computation for the cloud
Proceedings of the 7th ACM european conference on Computer Systems
SIROCCO'09 Proceedings of the 16th international conference on Structural Information and Communication Complexity
Monitoring for hierarchical web services compositions
TES'05 Proceedings of the 6th international conference on Technologies for E-Services
Analysis of interval-based global state detection
ICDCIT'05 Proceedings of the Second international conference on Distributed Computing and Internet Technology
Efficient model checking for LTL with partial order snapshots
TACAS'06 Proceedings of the 12th international conference on Tools and Algorithms for the Construction and Analysis of Systems
Automated systematic testing of open distributed programs
FASE'06 Proceedings of the 9th international conference on Fundamental Approaches to Software Engineering
Distributed GraphLab: a framework for machine learning and data mining in the cloud
Proceedings of the VLDB Endowment
Research note: Self-stabilizing byzantine asynchronous unison
Journal of Parallel and Distributed Computing
On time complexity of distributed algorithms for generalized deadlock detection
ADBIS'97 Proceedings of the First East-European conference on Advances in Databases and Information systems
Impact of over-decomposition on coordinated checkpoint/rollback protocol
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Research: Debugging tool for distributed Estelle programs
Computer Communications
Research: Modified distributed snapshots algorithm for protocol stabilization
Computer Communications
Optimal checkpointing interval of a communication system with rollback recovery
Mathematical and Computer Modelling: An International Journal
Memory management for many-core processors with software configurable locality policies
Proceedings of the 2012 international symposium on Memory Management
Theoretical Computer Science
Multimedia Tools and Applications
Ensuring reliability in B2B services: Fault tolerant inter-organizational workflows
Information Systems Frontiers
Composable reliability for asynchronous systems
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
Multi-agent A* for parallel and distributed systems
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Conservative synchronization methods for parallel DEVS and Cell-DEVS
Proceedings of the 2011 Summer Computer Simulation Conference
Fundamenta Informaticae - This is a SPECIAL ISSUE ON ASM'05
SIROCCO'12 Proceedings of the 19th international conference on Structural Information and Communication Complexity
Adding Partial Orders to Linear Temporal Logic
Fundamenta Informaticae
Alleviating scalability issues of checkpointing protocols
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Looking for a definition of dynamic distributed systems
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology
Detecting temporal logic predicates on distributed computations
DISC'07 Proceedings of the 21st international conference on Distributed Computing
DISC'12 Proceedings of the 26th international conference on Distributed Computing
ICFEM'12 Proceedings of the 14th international conference on Formal Engineering Methods: formal methods and software engineering
The viability of using compression to decrease message log sizes
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Efficient distributed snapshots in an anonymous asynchronous message-passing system
Journal of Parallel and Distributed Computing
Failure recovery: when the cure is worse than the disease
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Leveraging SDN layering to systematically troubleshoot networks
Proceedings of the second ACM SIGCOMM workshop on Hot topics in software defined networking
A low complexity coordination architecture for networked supervisory medical systems
Proceedings of the ACM/IEEE 4th International Conference on Cyber-Physical Systems
Distributed wait state tracking for runtime MPI deadlock detection
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
The Journal of Supercomputing
International Journal of High Performance Computing Applications
Post-failure recovery of MPI communication capability: Design and rationale
International Journal of High Performance Computing Applications
Proceedings of the 4th annual Symposium on Cloud Computing
Towards privacy-preserving fault detection
Proceedings of the 9th Workshop on Hot Topics in Dependable Systems
HotSnap: a hot distributed snapshot system for virtual machine cluster
LISA'13 Proceedings of the 27th international conference on Large Installation System Administration
Specification and Verification of Concurrent Programs Through Refinements
Journal of Automated Reasoning
Compiler-Assisted Checkpointing of Parallel Codes: The Cetus and LLVM Experience
International Journal of Parallel Programming
Detecting stable locality-aware predicates
Journal of Parallel and Distributed Computing
Modeling, analyzing and slicing periodic distributed computations
Information and Computation
Libra: divide and conquer to verify forwarding tables in huge networks
NSDI'14 Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation
Hi-index | 0.06 |
This paper presents an algorithm by which a process in a distributed system determines a global state of the system during a computation. Many problems in distributed systems can be cast in terms of the problem of detecting global states. For instance, the global state detection algorithm helps to solve an important class of problems: stable property detection. A stable property is one that persists: once a stable property becomes true it remains true thereafter. Examples of stable properties are “computation has terminated,” “ the system is deadlocked” and “all tokens in a token ring have disappeared.” The stable property detection problem is that of devising algorithms to detect a given stable property. Global state detection can also be used for checkpointing.