Tolerating operational faults in cluster-based FPGAs

  • Authors:
  • Vijay Lakamraju;Russell Tessier

  • Affiliations:
  • Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA;Department of Electrical and Computer Engineering, University of Massachusetts, Amherst, MA

  • Venue:
  • FPGA '00 Proceedings of the 2000 ACM/SIGDA eighth international symposium on Field programmable gate arrays
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In recent years the application space of reconfigurable devices has grown to include many platforms with a strong need for fault tolerance. While these systems frequently contain hardware redundancy to allow for continued operation in the presence of operational faults, the need to recover faulty hardware and return it to full functionality quickly and efficiently is great. In addition to providing functional density, FPGAs provide a level of fault tolerance generally not found in mask-programmable devices by including the capability to reconfigure around operational faults in the field. In this paper, incremental CAD techniques are described that allow functional recovery of FPGA design configurations in the presence of single or multiple operational faults. Our preferred approach to fault recovery takes advantage of device routing hierarchy in architectural families such as Xilinx Virtex [2] and Altera Apex [3] to quickly swap unused logic and routing resources in place of faulty ones within logic clusters. These algorithms allow for straight-forward implementation within a local fault-tolerant system without the need to access a remote processing location. If initial recovery attempts through localized swapping fail, an incremental router based on the widely-used PathFinder maze routing algorithm [10] can be applied remotely in an attempt to form connections between newly-allocated logic and interconnect based on the history of the initial design route.