Modeling and analysis of fault-tolerant distributed memories for networks-on-chip

Authors:
Abbas BanaiyanMofrad;Nikil Dutt;Gustavo Girão
Affiliations:
University of California, Irvine, CA;University of California, Irvine, CA;Federal University of Rio Grande do Sul, Porto Alegre, Brazil
Venue:
Proceedings of the Conference on Design, Automation and Test in Europe
Year:
2013

Citing 16
Cited 0

The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Simics: A Full System Simulation Platform

Computer
SoCIN: A Parametric and Scalable Network-on-Chip

SBCCI '03 Proceedings of the 16th symposium on Integrated circuits and systems design
A process-tolerant cache architecture for improved yield in nanoscale technologies

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Design and analysis of an NoC architecture from performance, reliability and energy perspective

Proceedings of the 2005 ACM symposium on Architecture for networking and communications systems
Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation

IEEE Micro
Exploring Fault-Tolerant Network-on-Chip Architectures

DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Yield-Aware Cache Architectures

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
An Analytical Model for Reliability Evaluation of NoC Architectures

IOLTS '07 Proceedings of the 13th IEEE International On-Line Testing Symposium
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A resilience roadmap: (invited paper)

Proceedings of the Conference on Design, Automation and Test in Europe
ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration

Proceedings of the Conference on Design, Automation and Test in Europe
Enabling system-level modeling of variation-induced faults in networks-on-chips

Proceedings of the 48th Design Automation Conference
New reliability mechanisms in memory design for sub-22nm technologies

IOLTS '11 Proceedings of the 2011 IEEE 17th International On-Line Testing Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Advances in technology scaling increasingly make Network-on-Chips (NoCs) more susceptible to failures that cause various reliability challenges. With increasing area occupied by different on-chip memories, strategies for maintaining fault-tolerance of distributed on-chip memories become a major design challenge. We propose a system-level design methodology for scalable fault-tolerance of distributed on-chip memories in NoCs. We introduce a novel reliability clustering model for fault-tolerance analysis and shared redundancy management of on-chip memory blocks. We perform extensive design space exploration applying the proposed reliability clustering on a block-redundancy fault-tolerant scheme to evaluate the tradeoffs between reliability, performance, and overheads. Evaluations on a 64-core chip multiprocessor (CMP) with an 8x8 mesh NoC show that distinct strategies of our case study may yield up to 20% improvements in performance gains and 25% improvement in energy savings across different benchmarks, and uncover interesting design configurations.