Debugging standard ML without reverse engineering
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
Real-time, concurrent checkpoint for parallel programs
PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Debuggable concurrency extensions for standard ML
PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
Database transaction models for advanced applications
Concepts and applications of multilevel transactions and open nested transactions
Database transaction models for advanced applications
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Composing first-class transactions
ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient optimistic concurrency control using loosely synchronized clocks
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Concurrent programming in ML
On optimistic methods for concurrency control
ACM Transactions on Database Systems (TODS)
ACM Transactions on Computer Systems (TOCS)
Performance analysis of checkpointing strategies
ACM Transactions on Computer Systems (TOCS)
CLIP: a checkpointing tool for message-passing parallel programs
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Automated application-level checkpointing of MPI programs
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
A User-level Checkpointing Library for POSIX Threads Programs
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Software transactional memory for dynamic-sized data structures
Proceedings of the twenty-second annual symposium on Principles of distributed computing
Selective Checkpointing and Rollbacks in Multithreaded Distributed Systems
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
On Page-Based Optimistic Process Checkpointing
IWOOOS '95 Proceedings of the 4th International Workshop on Object-Orientation in Operating Systems
NESTED TRANSACTIONS: AN APPROACH TO RELIABLE DISTRIBUTED COMPUTING
NESTED TRANSACTIONS: AN APPROACH TO RELIABLE DISTRIBUTED COMPUTING
Compiler-Assisted Checkpointing
Compiler-Assisted Checkpointing
Language support for lightweight transactions
OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
Kill-safe synchronization abstractions
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Adaptive incremental checkpointing for massively parallel systems
Proceedings of the 18th annual international conference on Supercomputing
Searching for deadlocks while debugging concurrent haskell programs
Proceedings of the ninth ACM SIGPLAN international conference on Functional programming
Preemption-Based Avoidance of Priority Inversion for Java
ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
Application-level checkpointing for shared memory programs
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Theoretical foundations for compensations in flow composition languages
Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Composable memory transactions
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
AtomCaml: first-class atomicity via rollback
Proceedings of the tenth ACM SIGPLAN international conference on Functional programming
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A transactional object calculus
Science of Computer Programming
Proceedings of the eleventh ACM SIGPLAN international conference on Functional programming
Microreboot — A technique for cheap recovery
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Hi-index | 0.00 |
Transient faults that arise in large-scale software systems can often be repaired by re-executing the code in which they occur. Ascribing a meaningful semantics for safe re-execution in multi-threaded code is not obvious, however. For a thread to correctly re-execute a region of code, it must ensure that all other threads which have witnessed its unwanted effects within that region are also reverted to a meaningful earlier state. If not done properly, data inconsistencies and other undesirable behavior may result. However, automatically determining what constitutes a consistent global checkpoint is not straightforward since thread interactions are a dynamic property of the program. In this paper, we present a safe and efficient checkpointing mechanism for Concurrent ML (CML) that can be used to recover from transient faults. We introduce a new linguistic abstraction called stabilizers that permits the specification of per-thread monitors and the restoration of globally consistent checkpoints. Global states are computed through lightweight monitoring of communication events among threads (e.g. message-passing operations or updates to shared variables). Our checkpointing abstraction provides atomicity and isolation guarantees during state restoration ensuring restored global states are safe. Our experimental results on several realistic, multithreaded, server-style CML applications, including a web server and a windowing toolkit, show that the overheads to use stabilizers are small, and lead us to conclude that they are a viable mechanism for defining safe checkpoints in concurrent functional programs. Our experiments conclude with a case study illustrating how to build open nested transactions from our checkpointing mechanism.