A model, analysis, and protocol framework for soft state-based communication
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Lessons from Giant-Scale Services
IEEE Internet Computing
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Reducing Recovery Time in a Small Recursively Restartable System
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Measuring End-User Availability on the Web: Practical Experience
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Harvest, Yield, and Scalable Tolerant Systems
HOTOS '99 Proceedings of the The Seventh Workshop on Hot Topics in Operating Systems
Recursive Restartability: Turning the Reboot Sledgehammer into a Scalpel
HOTOS '01 Proceedings of the Eighth Workshop on Hot Topics in Operating Systems
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
Rewind, repair, replay: three R's to dependability
EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
Studying and using failure data from large-scale internet services
EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
Exploring failure transparency and the limits of generic recovery
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Design and evaluation of a continuous consistency model for replicated services
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
Experiences in measuring the reliability of a cache-based storage system
WIESS'00 Proceedings of the 1st conference on Industrial Experiences with Systems Software - Volume 1
Towards availability benchmarks: a case study of software raid systems
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
Experience with some principles for building an internet-scale reliable system
WORLDS'05 Proceedings of the 2nd conference on Real, Large Distributed Systems - Volume 2
Software—Practice & Experience
Hi-index | 0.00 |
Recovery Oriented Computing (ROC) is a joint research effort between Stanford University and the University of California, Berkeley. ROC takes the perspective that hardware faults, software bugs, and operator errors are facts to be coped with, not problems to be solved. This perspective is supported both by historical evidence and by recent studies on the main sources of outages in production systems. By concentrating on reducing Mean Time to Repair (MTTR) rather than increasing Mean Time to Failure (MTTF), ROC reduces recovery time and thus offers higher availability. We describe the principles and philosophy behind the joint Stanford/Berkeley ROC effort and outline some of its research areas and current projects.