ROC-1: Hardware Support for Recovery-Oriented Computing

  • Authors:
  • David Oppenheimer;Aaron Brown;James Beck;Daniel Hettena;Jon Kuroda;Noah Treuhaft;David A. Patterson;Katherine Yelick

  • Affiliations:
  • Univ. of California , Berkeley;Univ. of California , Berkeley;Univ. of California , Berkeley;Univ. of California , Berkeley;Univ. of California , Berkeley;Univ. of California , Berkeley;Univ. of California , Berkeley;Univ. of California , Berkeley

  • Venue:
  • IEEE Transactions on Computers - Special issue on fault-tolerant embedded systems
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce the ROC-1 hardware platform, a large-scale cluster system designed to provide high availability for Internet service applications. The ROC-1 prototype embodies our philosophy of Recovery-Oriented Computing (ROC) by emphasizing detection and recovery from the failures that inevitably occur in Internet service environments, rather than simple avoidance of such failures. ROC-1 promises greater availability than existing server systems by incorporating four techniques applied from the ground up to both hardware and software: redundancy and isolation, online self-testing and verification, support for problem diagnosis, and concern for human interaction with the system.