Online fault tolerance for FPGA logic blocks

  • Authors:
  • John M. Emmert;Charles E. Stroud;Miron Abramovici

  • Affiliations:
  • Department of Electrical Engineering, Wright State University, Dayton, OH;Department of Electrical and Computer Engineering, Auburn University, AL;Design Automation for Flexible Chip Architectures, Framingham, MA

  • Venue:
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most adaptive computing systems use reconfigurable hardware in the form of field programmable gate arrays (FPGAs). For these systems to be fielded in harsh environments where high reliability and availability are a must, the applications running on the FPGAs must tolerate hardware faults that may occur during the lifetime of the system. In this paper, we present new fault-tolerant techniques for FPGA logic blocks, developed as part of the roving self-test areas (STARs) approach to online testing, diagnosis, and reconfiguration [1]. Our techniques can handle large numbers of faults (we show tolerance of over 100 logic faults via actual implementation on an FPGA consisting of a 20 × 20 array of logic blocks). A key novel feature is the reuse of defective logic blocks to increase the number of effective spares and extend the mission life. To increase fault tolerance, we not only use nonfaulty parts of defective or partially faulty logic blocks, but we also use faulty parts of defective logic blocks in nonfaulty modes. By using and reusing faulty resources, our multilevel approach extends the number of tolerable faults beyond the number of currently available spare logic resources. Unlike many column, row, or tile-based methods, our multilevel approach can tolerate not only faults that are evenly distributed over the logic area, but also clusters of faults in the same local area. Furthermore, system operation is not interrupted for fault diagnosis or for computing fault-bypassing configurations. Our fault tolerance techniques have been implemented using ORCA 2C series FPGAs which feature incremental dynamic runtime reconfiguration.