Toward common patterns for distributed, concurrent, fault-tolerant code

Authors:
Ryan Stutsman;John Ousterhout
Affiliations:
Stanford University;Stanford University
Venue:
HotOS'13 Proceedings of the 14th USENIX conference on Hot Topics in Operating Systems
Year:
2013

Citing 12
Cited 0

Experience with Grapevine: the growth of a distributed system

ACM Transactions on Computer Systems (TOCS)
Experience with processes and monitors in Mesa

Communications of the ACM
Monitors: an operating system structuring concept

Communications of the ACM
SEDA: an architecture for well-conditioned, scalable internet services

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Cooperative Task Management Without Manual Stack Management

ATEC '02 Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference
Capriccio: scalable threads for internet services

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Event-driven programming for robust software

EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
Why events are a bad idea (for high-concurrency servers)

HOTOS'03 Proceedings of the 9th conference on Hot Topics in Operating Systems - Volume 9
Paxos made live: an engineering perspective

Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Events can make sense

ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
The Hadoop Distributed File System

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
Fast crash recovery in RAMCloud

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Quantified Score

Hi-index	0.00

Visualization

Abstract

There are no widely accepted design patterns for writing distributed, concurrent, fault-tolerant code. Each programmer develops her own techniques for writing this type of complex software. The use of a common pattern for fault-tolerant programming has the potential to produce correct code more quickly and increase shared understanding between developers. We describe rules, tasks, and pools, patterns extracted from the development of RAMCloud, a fault-tolerant datacenter storage system. We illustrate their application and discuss their relationship to concurrent programming models. Our goal is to generate discussion that will ultimately lead to common techniques for fault-tolerant programming.