Cache coherence directories for scalable multiprocessors
Cache coherence directories for scalable multiprocessors
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
The Cache-Coherence Problem in Shared-Memory Multiprocessors: Hardware Solutions
The Cache-Coherence Problem in Shared-Memory Multiprocessors: Hardware Solutions
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Improving Multiple-CMP Systems Using Token Coherence
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
The M5 Simulator: Modeling Networked Systems
IEEE Micro
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual hierarchies to support server consolidation
Proceedings of the 34th annual international symposium on Computer architecture
Rent's rule and parallel programs: characterizing network traffic behavior
Proceedings of the 2008 international workshop on System level interconnect prediction
BENoC: A Bus-Enhanced Network on-Chip for a Power Efficient CMP
IEEE Computer Architecture Letters
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Group-caching for NoC based multicore cache coherent systems
Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Supporting cache coherence in current multicore processor still faces scalability and performance problems. This paper presents an optimized cache coherence design targeting at NoC-based multicore processors. It tries to achieve the best characteristics both of the snooping and of the directory-based protocols. With the observation of network traffic locality, we design a cache coherence that aims at local and remote access separately. At the first level, snooping is achieved within a cache group and at the second level of the protocol, the coarse directories provide the caches with information about which processors must be involved in first level snooping. To support efficient coherence broadcasting, we also propose a low latency, broadcast-enabled underlying NoC design. It incorporates light weight buses into NoCs, where the snooping protocol can be performed in a broadcast fashion. Extensive experimental results demonstrate that the proposed coherence design can achieve low complexity and high performance goals.