Scalable and reliable communication for hardware transactional memory
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Token tenure: PATCHing token counting using directory-based cache coherence
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Fault-tolerant cache coherence protocols for CMPs: evaluation and trade-offs
HiPC'08 Proceedings of the 15th international conference on High performance computing
Token tenure and PATCH: A predictive/adaptive token-counting hybrid
ACM Transactions on Architecture and Code Optimization (TACO)
A systematic methodology to develop resilient cache coherence protocols
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 26th ACM international conference on Supercomputing
A survey of checker architectures
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
It is widely accepted that transient failures will appear more frequently in chips designed in the near future due to several factors such as the increased integration scale. On the other hand, chip-multiprocessors (CMP) that integrate several processor cores in a single chip are nowadays the best alternative to more efficient use of the increasing number ber of transistors that can be placed in a single die. Hence, it is necessary to design new techniques to deal with these faults to be able to build sufficiently reliable Chip Multiprocessors processors (CMPs). In this work, we present a coherence protocol aimed at dealing with transient failures that affect the interconnection network of a CMP, thus assuming that the network is no longer reliable. In particular; our proposal posal extends a token-based cache coherence protocol so that no data can be lost and no deadlock can occur due to any dropped message. Using GEMS full system simulator, we compare our proposal against a similar protocol without fault tolerance (TOKENCMP). We show that in absence of failures our proposal does not introduce overhead in terms of increased execution time over TOKENCMP. Additionally, our protocol can tolerate message loss rates much higher than those likely to be found in the real world without increasing execution time more than 15%.