Fault-tolerant computer system design
Fault-tolerant computer system design
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Cost reduction and evaluation of temporary faults detecting technique
DATE '00 Proceedings of the conference on Design, automation and test in Europe
Modeling the Effect of Technology Trends on the Soft Error Rate of Combinational Logic
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Evaluation of a 32-bit Microprocessor with Built-In Concurrent Error-Detection
FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)
Time Redundancy Based Soft-Error Tolerance to Rescue Nanometer Technologies
VTS '99 Proceedings of the 1999 17TH IEEE VLSI Test Symposium
Partial Error Masking to Reduce Soft Error Failure Rate in Logic Circuits
DFT '03 Proceedings of the 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Razor: A Low-Power Pipeline Based on Circuit-Level Timing Speculation
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Cache Scrubbing in Microprocessors: Myth or Necessity?
PRDC '04 Proceedings of the 10th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC'04)
Soft error and energy consumption interactions: a data cache perspective
Proceedings of the 2004 international symposium on Low power electronics and design
SWIFT: Software Implemented Fault Tolerance
Proceedings of the international symposium on Code generation and optimization
Compiler-Directed Instruction Duplication for Soft Error Detection
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Soft Errors in Advanced Computer Systems
IEEE Design & Test
Design and Evaluation of Hybrid Fault-Detection Systems
Proceedings of the 32nd annual international symposium on Computer Architecture
Impact of Soft Error Challenge on SoC Design
IOLTS '05 Proceedings of the 11th IEEE International On-Line Testing Symposium
An Error Detection and Correction Scheme for RAMs with Partial-Write Function
MTDT '05 Proceedings of the 2005 IEEE International Workshop on Memory Technology, Design, and Testing
Replication Cache: A Small Fully Associative Cache to Improve Data Cache Reliability
IEEE Transactions on Computers
Computing Cache Vulnerability to Transient Errors and Its Implication
DFT '05 Proceedings of the 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems
ExtraVirt: detecting and recovering from transient processor faults
Proceedings of the twentieth ACM symposium on Operating systems principles
Compiler-directed selective data protection against soft errors
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Area-efficient error protection for caches
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Combinational Logic Soft Error Analysis and Protection
IOLTS '06 Proceedings of the 12th IEEE International Symposium on On-Line Testing
Mitigating soft error failures for multimedia applications by selective data protection
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Task scheduling for reliable cache architectures of multiprocessor systems
Proceedings of the conference on Design, automation and test in Europe
Balancing Performance and Reliability in the Memory Hierarchy
ISPASS '05 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, 2005
ISQED '08 Proceedings of the 9th international symposium on Quality Electronic Design
Partially protected caches to reduce failures due to soft errors in multimedia applications
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Smart cache cleaning: energy efficient vulnerability reduction in embedded processors
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Enabling energy efficient reliability in embedded systems through smart cache cleaning
ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Hi-index | 0.01 |
Increasing exponentially with technology scaling, the soft error rate even in earth-bound embedded systems manufactured in deep subnanometer technology is projected to become a serious design consideration. Partially protected cache (PPC) is a promising microarchitectural feature to mitigate failures due to soft errors in power, performance, and cost sensitive embedded processors. A processor with PPC maintains two caches, one protected and the other unprotected, both at the same level of memory hierarchy. The intuition behind PPCs is that not all data in the application is equally prone to soft errors. By finding and mapping the data that is more prone to soft errors to the protected cache, and error-resilient data to the unprotected cache, failures induced by soft errors can be significantly reduced at a minimal power and performance penalty. Consequently, the effectiveness of PPCs critically hinges on the compiler's ability to partition application data into error-prone and error-resilient data. The effectiveness of PPCs has previously been demonstrated on multimedia applications—where an obvious partitioning of data exists, the multimedia data is inherently resilient to soft errors, and the rest of the data and the entire code is assumed to be error-prone. Since the amount of multimedia data is a quite significant component of the entire application data, this obvious partitioning is quite effective. However, no such obvious data and code partitioning exists for general applications. This severely restricts the applicability of PPCs to data caches and instruction caches in general. This article investigates vulnerability-based partitioning schemes that are applicable to applications in general and effectively reduce failures due to soft errors at minimal power and performance overheads. Our experimental results on an HP iPAQ-like processor enhanced with PPC architecture, running benchmarks from the MiBench suite demonstrate that our partitioning heuristic efficiently finds page partitions for data PPCs that can reduce the failure rate by 48% at only 2% performance and 7% energy overhead, and finds page partitions for instruction PPCs that reduce the failure rate by 50% at only 2% performance and 8% energy overhead, on average.