Architectural-Level Fault Tolerant Computation in Nanoelectronic Processors

Authors:
Wenjing Rao;Alex Orailoglu;Ramesh Karri
Affiliations:
UC San Diego, CSE Department;UC San Diego, CSE Department;Polytechnic University, ECE Department
Venue:
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Year:
2005

Citing 8
Cited 1

DIVA: a reliable substrate for deep submicron microarchitecture design

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
NanoFabrics: spatial computing using molecular electronics

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Towards nanocomputer architecture

CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Roll-Forward Checkpointing Scheme: A Novel Fault-Tolerant Architecture

IEEE Transactions on Computers
Enhanced Cluster k-Ary n-Cube, A Fault-Tolerant Multiprocessor

IEEE Transactions on Computers
Fault tolerant nanoelectronic processor architectures

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Transient-Fault Recovery for Chip Multiprocessors

IEEE Micro
A system architecture solution for unreliable nanoelectronic devices

IEEE Transactions on Nanotechnology

Towards achieving reliable and high-performance nanocomputing via dynamic redundancy allocation

ACM Journal on Emerging Technologies in Computing Systems (JETC)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nanoelectronic devices are expected to have extremely high and variable fault rates; thus future processor architectures based on these unreliable devices need to be built with fault tolerance embedded so as to satisfy the fundamental requirement of computational correctness. In this paper an architectural-level computation model is proposed for fault tolerant computations in nanoelectronic processors. The proposed scheme is capable of guaranteeing the correctness of each instruction through exploitation of both hardware and time redundancy, even under high and variable fault rates. Each instruction is confirmed by multiple computation instances. Through a speculative execution based on unconfirmed results, the proposed scheme eliminates the severe performance deterioration typically caused by time redundancy approaches on data dependent instructions. To avoid the exponential growth of resource allocation introduced by the hardware redundancy approaches on the speculations, a hardware allocation framework is developed in the proposed scheme to control the growth of hardware resources while preserving the low latency achieved through the speculative executions. We set up an experimental framework to validate the effectiveness of the proposed scheme as well as to investigate multiple tradeoff points within the proposed approach. Experimental data further confirm that the proposed approach achieves the goal of providing fault tolerance in the pipelined nanoelectronic processors, while at the same time providing high system performance and efficient utilization of hardware resources.