An efficient, dynamically adaptive method to tolerate transient faults in multi-core systems
EWDC '11 Proceedings of the 13th European Workshop on Dependable Computing
A fault-tolerant, dynamically scheduled pipeline structure for chip multiprocessors
SAFECOMP'11 Proceedings of the 30th international conference on Computer safety, reliability, and security
A survey of checker architectures
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Reliability has become a first-class consideration issue for architects along with performance and energy-efficiency. The increasing scaling technology and subsequent supply voltage reductions are increasing the susceptibility of architectures to soft errors. However, mechanisms to achieve full coverage to errors usually degrade performance in an unacceptable way for the majority of common users. Simultaneous and Redundantly Threaded (SRT) [13] is a fault tolerant architecture in which pairs of threads in a SMT core redundantly execute the same program instructions. In this paper, we study the under-explored architectural support of SRT to reliably execute shared-memory applications. We show how atomic operations induce a serialization point between master and slave threads. This bottleneck has an impact of 34% in execution speed for several parallel scientific benchmarks. We propose an alternative mechanism in which the L1 cache is updated by master's stores before verification reducing the overhead up to 21%. Our approach also outperforms other recent proposals such as DCC with a decrease of 8% in execution speed.