MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MPARM: Exploring the Multi-Processor SoC Design Space with SystemC
Journal of VLSI Signal Processing Systems
Multi-bit Error Tolerant Caches Using Two-Dimensional Error Coding
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
A case study in reliability-aware design: a resilient LDPC code decoder
Proceedings of the conference on Design, automation and test in Europe
Globally optimized robust systems to overcome scaled CMOS reliability challenges
Proceedings of the conference on Design, automation and test in Europe
Mitigating the impact of hardware defects on multimedia applications: a cross-layer approach
MM '08 Proceedings of the 16th ACM international conference on Multimedia
ERSA: error resilient system architecture for probabilistic applications
Proceedings of the Conference on Design, Automation and Test in Europe
Reliability-Driven ECC Allocation for Multiple Bit Error Resilience in Processor Cache
IEEE Transactions on Computers
Hi-index | 0.00 |
Recent advances in process technology augment the systems-on-chip (SoCs) functionality per unit area with the substantial decrease of device features. However, features abatement triggers new reliability issues such as the single-event multi-bit upset (SMU) failure rates augmentation. To mitigate these failure rates, we propose a novel error mitigation mechanism that relies on a hybrid HW-SW technique. In our proposal, we enforce SoC SRAMs by implementing a fault-tolerant memory buffer with minimal capacity to ensure error-free operation. We utilize this buffer to temporarily store a portion of the stored data, named a data chunk, that is used to restore another data chunk in a fully demand-driven way, in case the latter is faulty. We formulate the buffer and data chunk size selection as an optimization problem that targets energy overhead minimization, given that timing and area overheads are restricted with hard constraints decided beforehand by the system designers. We show that our proposed mitigation scheme achieves full error mitigation in a real SoC platform with an average of 10.1% energy overhead with respect to a base-line system operation, while guaranteeing all the design-time constraints.