Fault Tolerant Schemes for Hot-Swappable and Non Hot-Swappable Mezzanine Cards

  • Authors:
  • Mark Lanus

  • Affiliations:
  • Availability Engineering Department, Motorola Embedded Communications Computing, 2900 S. Diable Way, DW220, Tempe AZ, 85282, USA

  • Venue:
  • ISAS '07 Proceedings of the 4th international symposium on Service Availability
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

First generation, highly-available computer systems deployed a two-level physical hierarchy whereby a shelf was composed of field replaceable units (FRU) and the unit of fault detection, fault isolation, fault containment, fault recovery, fault repair, and sparing was the FRU. In 1995, IEEE introduced the non hot-swappable PCI Mezzanine Card (PMC) draft standard [1] that allows fault detection, isolation, containment, recovery, and sparing to be implemented at the mezzanine card level but requires fault repair to occur at the carrier board level. In 2005 the PCI Industrial Computer Manufacturers Group (PICMG®) introduced the hot swappable Advanced Mezzanine Card (AMC) standard [2] that extends the PMC model to allow all fault management functions, including fault repair, to be implemented at the mezzanine card level. This paper develops fault management strategies and availability models for the monolithic, non hot swap partitioned, and hot swap partitioned hardware architectures.