Fault tolerance via idempotence

Authors:
Ganesan Ramalingam;Kapil Vaswani
Affiliations:
Microsoft Research, Bangalore, India;Microsoft Research, Bangalore, India
Venue:
POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Year:
2013

Citing 15
Cited 1

Nested transactions: an approach to reliable distributed computing

Nested transactions: an approach to reliable distributed computing
Sagas

SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Distributed programming in Argus

Communications of the ACM
Concepts and applications of multilevel transactions and open nested transactions

Database transaction models for advanced applications
Monad transformers and modular interpreters

POPL '95 Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Impossibility of distributed consensus with one faulty process

Journal of the ACM (JACM)
Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery

Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery
X-Ability: a theory of replication

Distributed Computing
The marriage of effects and monads

ACM Transactions on Computational Logic (TOCL)
Transactors: a programming model for maintaining globally consistent distributed state in unreliable environments

Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Theoretical foundations for compensations in flow composition languages

Proceedings of the 32nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Static typing for a faulty lambda calculus

Proceedings of the eleventh ACM SIGPLAN international conference on Functional programming
BASE: An Acid Alternative

Queue - Object-Relational Mapping
A Process Calculus Analysis of Compensations

Trustworthy Global Computing
Idempotence Is Not a Medical Condition

Queue - Processors

Loop elimination for database updates

BNCOD'13 Proceedings of the 29th British National conference on Big Data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Building distributed services and applications is challenging due to the pitfalls of distribution such as process and communication failures. A natural solution to these problems is to detect potential failures, and retry the failed computation and/or resend messages. Ensuring correctness in such an environment requires distributed services and applications to be idempotent. In this paper, we study the inter-related aspects of process failures, duplicate messages, and idempotence. We first introduce a simple core language (based on lambda calculus inspired by modern distributed computing platforms. This language formalizes the notions of a service, duplicate requests, process failures, data partitioning, and local atomic transactions that are restricted to a single store. We then formalize a desired (generic) correctness criterion for applications written in this language, consisting of idempotence (which captures the desired safety properties) and failure-freedom (which captures the desired progress properties). We then propose language support in the form of a monad that automatically ensures failfree idempotence. A key characteristic of our implementation is that it is decentralized and does not require distributed coordination. We show that the language support can be enriched with other useful constructs, such as compensations, while retaining the coordination-free decentralized nature of the implementation. We have implemented the idempotence monad (and its variants) in F# and C# and used our implementation to build realistic applications on Windows Azure. We find that the monad has low runtime overheads and leads to more declarative applications.