Configurable isolation: building high availability systems with commodity multi-core processors

  • Authors:
  • Nidhi Aggarwal;Parthasarathy Ranganathan;Norman P. Jouppi;James E. Smith

  • Affiliations:
  • University of Wisconsin-Madison, Madison, WI;Hewlett Packard Labs, Palo Alto, CA;Hewlett Packard Labs, Palo Alto, CA;University of Wisconsin-Madison, Madison, WI

  • Venue:
  • Proceedings of the 34th annual international symposium on Computer architecture
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

High availability is an increasingly important requirement for enterprise systems, often valued more than performance. Systems designed for high availability typically use redundant hardware for error detection and continued uptime in the event of a failure. Chip multiprocessors with an abundance of identical resources like cores, cache and interconnection networks would appear to be ideal building blocks for implementing high availability solutions on chip. However, doing so poses significant challenges with respect to error containment and faulty component replacement. Increasing silicon and transient fault rates with future technology scaling exacerbate the problem. This paper proposes a novel, cost-effective, architecture for high availability systems built from future multi-core processors. We propose a new chip multiprocessor architecture that provides configurable isolation for fault containment and component retirement, based upon cost-effective modifications to commodity designs. The design is evaluated for a state-of-the-art industrial fault model and the proposed architecture is shown to provide effective fault isolation and graceful degradation even when the failure rate is high.