Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
Transparent process migration: design alternatives and the sprite implementation
Software—Practice & Experience
Managing update conflicts in Bayou, a weakly connected replicated storage system
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The MOSIX multicomputer operating system for high performance cluster computing
Future Generation Computer Systems - Special issue on HPCN '97
Resource containers: a new facility for resource management in server systems
OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
CoCheck: Checkpointing and Process Migration for MPI
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
SOSP '83 Proceedings of the ninth ACM symposium on Operating systems principles
The design and implementation of Zap: a system for migrating computing environments
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
The Chubby lock service for loosely-coupled distributed systems
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Sinfonia: a new paradigm for building scalable distributed systems
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Dynamo: amazon's highly available key-value store
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Transparent checkpoint-restart of multiple processes on commodity operating systems
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Remus: high availability via asynchronous virtual machine replication
NSDI'08 Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation
SnowFlock: rapid virtual machine cloning for cloud computing
Proceedings of the 4th ACM European conference on Computer systems
A Unified Framework for Load Distribution and Fault-Tolerance of Application Servers
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Respec: efficient online multiprocessor replayvia speculation and external determinism
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Pregel: a system for large-scale graph processing
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
ZooKeeper: wait-free coordination for internet-scale systems
USENIXATC'10 Proceedings of the 2010 USENIX conference on USENIX annual technical conference
An analysis of Linux scalability to many cores
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
PowerGraph: distributed graph-parallel computation on natural graphs
OSDI'12 Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation
RemusDB: transparent high availability for database systems
The VLDB Journal — The International Journal on Very Large Data Bases
Split/merge: system support for elastic execution in virtual middleboxes
nsdi'13 Proceedings of the 10th USENIX conference on Networked Systems Design and Implementation
Hi-index | 0.00 |
Software is modular, and so is run-time state. We argue that by allowing individual layers of the software stack to store isolated runtime state, we cripple the ability of systems to effectively scale or respond to failures. Given the strong desire to build elastic and highly available applications for the cloud, we propose Slice, an abstraction that allows applications to declare appropriate granularities of scale-oriented state, and allows layers to contribute the appropriate layer-specific data to those containers. Slices can be transparently migrated and replicated between application instances, thereby simplifying design of elastic and highly available systems, while retaining the modularity of modern software.