Toward recovery-oriented computing

Authors:
Armando Fox
Affiliations:
Stanford University, Stanford, California
Venue:
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Year:
2002

Citing 14
Cited 2

A model, analysis, and protocol framework for soft state-based communication

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Lessons from Giant-Scale Services

IEEE Internet Computing
Pinpoint: Problem Determination in Large, Dynamic Internet Services

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Reducing Recovery Time in a Small Recursively Restartable System

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Measuring End-User Availability on the Web: Practical Experience

DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Harvest, Yield, and Scalable Tolerant Systems

HOTOS '99 Proceedings of the The Seventh Workshop on Hot Topics in Operating Systems
Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel

HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,

Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
Rewind, repair, replay: three R's to dependability

EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
Studying and using failure data from large-scale internet services

EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
Exploring failure transparency and the limits of generic recovery

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Design and evaluation of a continuous consistency model for replicated services

OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Experiences in measuring the reliability of a cache-based storage system

WIESS'00 Proceedings of the 1st conference on Industrial Experiences with Systems Software - Volume 1
Towards availability benchmarks: a case study of software raid systems

ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference

Experience with some principles for building an internet-scale reliable system

WORLDS'05 Proceedings of the 2nd conference on Real, Large Distributed Systems - Volume 2
Evaluating the recovery-oriented approach through the systematic development of real complex applications

Software—Practice & Experience

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recovery Oriented Computing (ROC) is a joint research effort between Stanford University and the University of California, Berkeley. ROC takes the perspective that hardware faults, software bugs, and operator errors are facts to be coped with, not problems to be solved. This perspective is supported both by historical evidence and by recent studies on the main sources of outages in production systems. By concentrating on reducing Mean Time to Repair (MTTR) rather than increasing Mean Time to Failure (MTTF), ROC reduces recovery time and thus offers higher availability. We describe the principles and philosophy behind the joint Stanford/Berkeley ROC effort and outline some of its research areas and current projects.