Fault-tolerant cache coherence protocols for CMPs: evaluation and trade-offs

  • Authors:
  • Ricardo Fernández-Pascual;José M. García;Manuel E. Acacio;José Duato

  • Affiliations:
  • Departamento de Ingeniería y Tecnología de Computadores, Universidad de Murcia, Murcia, Spain;Departamento de Ingeniería y Tecnología de Computadores, Universidad de Murcia, Murcia, Spain;Departamento de Ingeniería y Tecnología de Computadores, Universidad de Murcia, Murcia, Spain;Dpto. de Informática de Sistemas y Computadores, Universidad Politécnica de Valencia, Valencia, Spain

  • Venue:
  • HiPC'08 Proceedings of the 15th international conference on High performance computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

One way of dealing with transient faults that will affect theinterconnection network of future large-scale ChipMultiprocessor (CMP)systems is by extending the cache coherence protocol. Fault tolerance atthe level of the cache coherence protocol has been proven to achieve verylow performance overhead in absence of faults while being able to supportvery high fault rates. In this work, we compare two already proposed fault-tolerant cache coherence protocols in a common framework and present anew one based in the cache coherence protocol used in AMD Opteron processors.Also, we thoroughly evaluate the performance of the three protocols,show how to adjust the fault tolerance parameters of the protocols toachieve a desired level of fault tolerance andmeasure the overhead achievedto be able to support very high transient fault rates.