The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Area, Performance, and Yield Implications of Redundancy in On-Chip Caches
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
SoCIN: A Parametric and Scalable Network-on-Chip
SBCCI '03 Proceedings of the 16th symposium on Integrated circuits and systems design
Immunet: A Cheap and Robust Fault-Tolerant Packet Routing Mechanism
Proceedings of the 31st annual international symposium on Computer architecture
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A process-tolerant cache architecture for improved yield in nanoscale technologies
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Proceedings of the 42nd annual Design Automation Conference
A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks
Proceedings of the 33rd annual international symposium on Computer Architecture
Yield-Aware Cache Architectures
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Configurable isolation: building high availability systems with commodity multi-core processors
Proceedings of the 34th annual international symposium on Computer architecture
Solutions for Real Chip Implementation Issues of NoC and Their Application to Memory-Centric NoC
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
The Impact of Higher Communication Layers on NoC Supported MP-SoCs
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
DSD '07 Proceedings of the 10th Euromicro Conference on Digital System Design Architectures, Methods and Tools
Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Trading off Cache Capacity for Reliability to Enable Low Voltage Operation
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Tolerating process variations in large, set-associative caches: The buddy cache
ACM Transactions on Architecture and Code Optimization (TACO)
Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Error-correcting codes for semiconductor memory applications: a state-of-the-art review
IBM Journal of Research and Development
Reducing cache power with low-cost, multi-bit error-correcting codes
Proceedings of the 37th annual international symposium on Computer architecture
ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration
Proceedings of the Conference on Design, Automation and Test in Europe
ParMiBench - An Open-Source Benchmark for Embedded Multiprocessor Systems
IEEE Computer Architecture Letters
Address Remapping for Static NUCA in NoC-Based Degradable Chip-Multiprocessors
PRDC '10 Proceedings of the 2010 IEEE 16th Pacific Rim International Symposium on Dependable Computing
Archipelago: A polymorphic cache design for enabling robust near-threshold operation
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
A fault-tolerant NoC scheme using bidirectional channel
Proceedings of the 48th Design Automation Conference
FFT-cache: a flexible fault-tolerant cache architecture for ultra low voltage operation
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Error control schemes for on-chip communication links: the energy-reliability tradeoff
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
NoC-based fault-tolerant cache design in chip multiprocessors
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
Advances in technology scaling, coupled with aggressive voltage scaling results in significant reliability challenges for emerging Chip Multiprocessor (CMP) platforms, where error-prone caches continue to dominate the chip area. Network-on-Chip (NoC) fabrics are increasingly used to manage the scalability of these CMPs. We present a novel fault-tolerant scheme for Last Level Cache (LLC) in CMP architectures that leverages the interconnection network to protect the LLC cache banks against permanent faults. During a LLC access to a faulty area, the network detects and corrects the faults, returning the fault-free data to the requesting core. By leveraging the NoC interconnection fabric, we can implement any cache fault-tolerant scheme in an efficient, modular, and scalable manner. We perform extensive design space exploration on NoC benchmarks to demonstrate the utility and efficacy of our approach. The overheads of leveraging the NoC fabric are minimal: on an 8-core, 16-cache-bank CMP we demonstrate reliable access to LLCs with additional overheads of less than 3% in area and less than 7% in power.