A Cache Error Propagation Model

  • Authors:
  • A. K. Somani;K. S. Trivedi

  • Affiliations:
  • -;-

  • Venue:
  • PRFTS '97 Proceedings of the 1997 Pacific Rim International Symposium on Fault-Tolerant Systems
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cache memory is a small, fast, memory system that holds frequently used data. With increasing processor speed, aggressive design practices increase the probability of fault occurrence and the presence of latent errors as the processor allows a short duration for read and write. The fault may corrupt the cache memory system or lead to an erroneous internal CPU state. The authors investigate error propagation in the cache memory system due to transient faults either in the cache memory itself or in the processor's registers or both. The information gained from such an investigation should lead to the development of more effective error recovery mechanisms against failures due to transient faults arising in the machine's cache memory and register set. They establish that even though the computer system is capable of recovering about 50% of the time from the effect of a single erroneous cache location/processor register, the other 50% of the time error recovery is affected only through specific recovery mechanisms. Their results are obtained using both a discrete-time Markov model and by means of error injection on a real system.