Replicating tag entries for reliability enhancement in cache tag arrays

  • Authors:
  • Shuai Wang;Jie Hu;Sotirios G. Ziavras

  • Affiliations:
  • Department of Computer Science and Technology, Nanjing University, Nanjing, China;Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ;Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ

  • Venue:
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Protecting on-chip cache memories against soft errors has become an increasing challenge in designing new generation reliable microprocessors. Previous efforts have mainly focused on improving the reliability of the cache data arrays. Due to its crucial importance to the correctness of cache accesses, the tag array also demands high reliability against soft errors. Exploiting the address locality of memory accesses, we propose to duplicate most recently accessed tag entries in a small tag replication buffer (TRB) thus to protect the information integrity of the tag array in the data cache. Experimental results show that our proposed TRB scheme achieves a high 90% access-with-replica (AWR) rate with low performance (∼0%), energy (16.3%), and area (19.9%) overheads. We also conduct a detailed design space exploration for the TRB design and propose a selective TRB scheme that achieves a higher AWR rate (97.4%) for the dirty cachelines with negligible overheads. To provide a comprehensive evaluation of the tag-array reliability, we further conduct an architectural vulnerability factor (AVF) analysis for the tag array in the data cache and propose a refined metric, detected-without-replica-AVF (DOR-AVF), which combines the AVF and AWR analysis. Based on our DOR-AVF analysis, a selective TRB scheme with early write-back (S-TRBEWB) is proposed, which achieves a zero DOR-AVF and 100% AWR rate at a negligible performance overhead. Results from statistical fault/error injection experiment also confirm the effectiveness of our TRB schemes and the achieved reliability of the cache tag array that recovers 100% of detected errors.