Operational Profiles in Software-Reliability Engineering
IEEE Software
A scalable software-based self-test methodology for programmable processors
Proceedings of the 40th annual Design Automation Conference
The NAS Parallel Benchmark Kernels in MPL
The NAS Parallel Benchmark Kernels in MPL
Reliability Wearout Mechanisms in Advanced CMOS Technologies
Reliability Wearout Mechanisms in Advanced CMOS Technologies
Systematic software-based self-test for pipelined processors
Proceedings of the 43rd annual Design Automation Conference
Ultra low-cost defect protection for microprocessor pipelines
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Microprocessors in the era of terascale integration
Proceedings of the conference on Design, automation and test in Europe
Low-cost protection for SER upsets and silicon defects
Proceedings of the conference on Design, automation and test in Europe
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
CASP: concurrent autonomous chip self-test using stored test patterns
Proceedings of the conference on Design, automation and test in Europe
Adaptive online testing for efficient hard fault detection
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Fault-based attack of RSA authentication
Proceedings of the Conference on Design, Automation and Test in Europe
Viper: virtual pipelines for enhanced reliability
Proceedings of the 39th Annual International Symposium on Computer Architecture
A survey of checker architectures
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Extreme technology scaling in silicon devices drastically affects reliability, particularly because of runtime failures induced by transistor wearout. Current online testing mechanisms focus on testing all components in a microprocessor, including hardware that has not been exercised, and thus have high performance penalties. We propose a hybrid hardware/software online testing solution where components that are heavily utilized by the software application are tested more thoroughly and frequently. Thus, our online testing approach focuses on the processor units that affect application correctness the most, and it achieves high coverage while incurring minimal performance overhead. We also introduce a new metric, Application-Aware Fault Coverage, measuring a test's capability to detect faults that might have corrupted the state or the output of an application. Test coverage is further improved through the insertion of observation points that augment the coverage of the testing system. By evaluating our technique on a Sun OpenSPARC T1, we show that our solution maintains high Application-Aware Fault Coverage while reducing the performance overhead of online testing by more than a factor of 2 when compared to solutions oblivious to application's behavior. Specifically, we found that our solution can achieve 95% fault coverage while maintaining a minimal performance overhead (1.3%) and area impact (0.4%).