Reaching Agreement in the Presence of Faults
Journal of the ACM (JACM)
Automated application-level checkpointing of MPI programs
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Mace: language support for building distributed systems
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Fluxo: a system for internet service programming by non-expert developers
Proceedings of the 1st ACM symposium on Cloud computing
Finding latent performance bugs in systems implementations
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Life, death, and the critical transition: finding liveness bugs in systems code
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
InContext: simple parallelism for distributed applications
Proceedings of the 20th international symposium on High performance distributed computing
Practical software model checking via dynamic interface reduction
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Composable reliability for asynchronous systems
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
EventWave: programming model and runtime support for tightly-coupled elastic cloud applications
Proceedings of the 4th annual Symposium on Cloud Computing
Hi-index | 0.00 |
An attractive approach to leveraging the ability of cloud-computing platforms to provide resources on demand is to build elastic applications, which can scale up or down based on resource requirements. To ease the development of elastic applications, it is useful for programmers to write applications with simple, inelastic semantics and rely on runtime systems to provide elasticity. While this approach has been useful in restricted domains, such as MapReduce, we argue in this paper that adding elasticity to general distributed applications introduces new fault-tolerance challenges that must be addressed at the programming model and run-time level. We discuss a programming model for writing elastic, distributed applications, and describe an approach to fault-tolerance that integrates with the elastic run-time to provide transparent elasticity and fault-tolerance.