A class of compatible cache consistency protocols and their support by the IEEE futurebus
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Multicast snooping: a new coherence method using a multicast address network
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Architecture and design of AlphaServer GS320
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Timestamp snooping: an approach for extending SMPs
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Spider: A High-Speed Network Interconnect
IEEE Micro
Starfire: Extending the SMP Envelope
IEEE Micro
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
WildFire: A Scalable Path for SMPs
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Token coherence: decoupling performance and correctness
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 30th annual international symposium on Computer architecture
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Principles and Practices of Interconnection Networks
Principles and Practices of Interconnection Networks
Improving Multiple-CMP Systems Using Token Coherence
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Coherence Ordering for Ring-based Chip Multiprocessors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual hierarchies to support server consolidation
Proceedings of the 34th annual international symposium on Computer architecture
An Effective Starvation Avoidance Mechanism to Enhance the Token Coherence Protocol
PDP '07 Proceedings of the 15th Euromicro International Conference on Parallel, Distributed and Network-Based Processing
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Error Detection via Online Checking of Cache Coherence with Token Coherence Signatures
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
A Low Overhead Fault Tolerant Coherence Protocol for CMP Architectures
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
Token tenure: PATCHing token counting using directory-based cache coherence
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
POWER4 system microarchitecture
IBM Journal of Research and Development
Switch-based packing technique to reduce traffic and latency in token coherence
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Traditional coherence protocols present a set of difficult trade-offs: the reliance of snoopy protocols on broadcast and ordered interconnects limits their scalability, while directory protocols incur a performance penalty on sharing misses due to indirection. This work introduces Patch (Predictive/Adaptive Token-Counting Hybrid), a coherence protocol that provides the scalability of directory protocols while opportunistically sending direct requests to reduce sharing latency. Patch extends a standard directory protocol to track tokens and use token-counting rules for enforcing coherence permissions. Token counting allows Patch to support direct requests on an unordered interconnect, while a mechanism called token tenure provides broadcast-free forward progress using the directory protocol's per-block point of ordering at the home along with either timeouts at requesters or explicit race notification messages. Patch makes three main contributions. First, Patch introduces token tenure, which provides broadcast-free forward progress for token-counting protocols. Second, Patch deprioritizes best-effort direct requests to match or exceed the performance of directory protocols without restricting scalability. Finally, Patch provides greater scalability than directory protocols when using inexact encodings of sharers because only processors holding tokens need to acknowledge requests. Overall, Patch is a “one-size-fits-all” coherence protocol that dynamically adapts to work well for small systems, large systems, and anywhere in between.