The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Area, Performance, and Yield Implications of Redundancy in On-Chip Caches
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
SoCIN: A Parametric and Scalable Network-on-Chip
SBCCI '03 Proceedings of the 16th symposium on Integrated circuits and systems design
Immunet: A Cheap and Robust Fault-Tolerant Packet Routing Mechanism
Proceedings of the 31st annual international symposium on Computer architecture
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A process-tolerant cache architecture for improved yield in nanoscale technologies
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Proceedings of the 42nd annual Design Automation Conference
A low latency router supporting adaptivity for on-chip interconnects
Proceedings of the 42nd annual Design Automation Conference
A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks
Proceedings of the 33rd annual international symposium on Computer Architecture
Yield-Aware Cache Architectures
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Configurable isolation: building high availability systems with commodity multi-core processors
Proceedings of the 34th annual international symposium on Computer architecture
Solutions for Real Chip Implementation Issues of NoC and Their Application to Memory-Centric NoC
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
The Impact of Higher Communication Layers on NoC Supported MP-SoCs
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
DSD '07 Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools
Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Trading off Cache Capacity for Reliability to Enable Low Voltage Operation
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
A novel migration-based NUCA design for chip multiprocessors
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Tolerating process variations in large, set-associative caches: The buddy cache
ACM Transactions on Architecture and Code Optimization (TACO)
Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Error-correcting codes for semiconductor memory applications: a state-of-the-art review
IBM Journal of Research and Development
Investigation of Transient Fault Effects in an Asynchronous NoC Router
PDP '10 Proceedings of the 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Reducing cache power with low-cost, multi-bit error-correcting codes
Proceedings of the 37th annual international symposium on Computer architecture
A resilience roadmap: (invited paper)
Proceedings of the Conference on Design, Automation and Test in Europe
ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration
Proceedings of the Conference on Design, Automation and Test in Europe
ParMiBench - An Open-Source Benchmark for Embedded Multiprocessor Systems
IEEE Computer Architecture Letters
Address Remapping for Static NUCA in NoC-Based Degradable Chip-Multiprocessors
PRDC '10 Proceedings of the 2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing
Energy-efficient cache design using variable-strength error-correcting codes
Proceedings of the 38th annual international symposium on Computer architecture
Archipelago: A polymorphic cache design for enabling robust near-threshold operation
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
FFT-cache: a flexible fault-tolerant cache architecture for ultra low voltage operation
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Error control schemes for on-chip communication links: the energy-reliability tradeoff
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A novel NoC-based design for fault-tolerance of last-level caches in CMPs
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Hi-index | 0.00 |
Advances in technology scaling increasingly make emerging Chip MultiProcessor (CMP) platforms more susceptible to failures that cause various reliability challenges. In such platforms, error-prone on-chip memories (caches) continue to dominate the chip area. Also, Network-on-Chip (NoC) fabrics are increasingly used to manage the scalability of these architectures. We present a novel solution for efficient implementation of fault-tolerant design of Last-Level Cache (LLC) in CMP architectures. The proposed approach leverages the interconnection network fabric to protect the LLC cache banks against permanent faults in an efficient and scalable way. During an LLC access to a faulty block, the network detects and corrects the faults, returning the fault-free data to the requesting core. Leveraging the NoC interconnection fabric, designers can implement any cache fault-tolerant scheme in an efficient, modular, and scalable manner for emerging multicore/manycore platforms. We propose four different policies for implementing a remapping-based fault-tolerant scheme leveraging the NoC fabric in different settings. The proposed policies enable design trade-offs between NoC traffic (packets sent through the network) and the intrinsic parallelism of these communication mechanisms, allowing designers to tune the system based on design constraints. We perform an extensive design space exploration on NoC benchmarks to demonstrate the usability and efficacy of our approach. In addition, we perform sensitivity analysis to observe the behavior of various policies in reaction to improvements in the NoC architecture. The overheads of leveraging the NoC fabric are minimal: on an 8-core, 16-cache-bank CMP we demonstrate reliable access to LLCs with additional overheads of less than 3% in area and less than 7% in power.