Token tenure and PATCH: A predictive/adaptive token-counting hybrid

Authors:
Arun Raghavan;Colin Blundell;Milo M. K. Martin
Affiliations:
University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA;University of Pennsylvania, Philadelphia, PA
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2010

Citing 33
Cited 1

A class of compatible cache consistency protocols and their support by the IEEE futurebus

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Multicast snooping: a new coherence method using a multicast address network

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
Architecture and design of AlphaServer GS320

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Timestamp snooping: an approach for extending SMPs

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Simics: A Full System Simulation Platform

Computer
Spider: A High-Speed Network Interconnect

IEEE Micro
Starfire: Extending the SMP Envelope

IEEE Micro
Simulating a $2M Commercial Server on a $2K PC

Computer
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
WildFire: A Scalable Path for SMPs

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Token coherence: decoupling performance and correctness

Proceedings of the 30th annual international symposium on Computer architecture
Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors

Proceedings of the 30th annual international symposium on Computer architecture
Bandwidth Adaptive Snooping

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Principles and Practices of Interconnection Networks

Principles and Practices of Interconnection Networks
Improving Multiple-CMP Systems Using Token Coherence

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Coherence Ordering for Ring-based Chip Multiprocessors

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual hierarchies to support server consolidation

Proceedings of the 34th annual international symposium on Computer architecture
An Effective Starvation Avoidance Mechanism to Enhance the Token Coherence Protocol

PDP '07 Proceedings of the 15th Euromicro International Conference on Parallel, Distributed and Network-Based Processing
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Error Detection via Online Checking of Cache Coherence with Token Coherence Signatures

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
A Low Overhead Fault Tolerant Coherence Protocol for CMP Architectures

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Circuit-Switched Coherence

NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
Token Coherence: A New Framework for Shared-Memory Multiprocessors

IEEE Micro
Token tenure: PATCHing token counting using directory-based cache coherence

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
POWER4 system microarchitecture

IBM Journal of Research and Development

Switch-based packing technique to reduce traffic and latency in token coherence

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional coherence protocols present a set of difficult trade-offs: the reliance of snoopy protocols on broadcast and ordered interconnects limits their scalability, while directory protocols incur a performance penalty on sharing misses due to indirection. This work introduces Patch (Predictive/Adaptive Token-Counting Hybrid), a coherence protocol that provides the scalability of directory protocols while opportunistically sending direct requests to reduce sharing latency. Patch extends a standard directory protocol to track tokens and use token-counting rules for enforcing coherence permissions. Token counting allows Patch to support direct requests on an unordered interconnect, while a mechanism called token tenure provides broadcast-free forward progress using the directory protocol's per-block point of ordering at the home along with either timeouts at requesters or explicit race notification messages. Patch makes three main contributions. First, Patch introduces token tenure, which provides broadcast-free forward progress for token-counting protocols. Second, Patch deprioritizes best-effort direct requests to match or exceed the performance of directory protocols without restricting scalability. Finally, Patch provides greater scalability than directory protocols when using inexact encodings of sharers because only processors holding tokens need to acknowledge requests. Overall, Patch is a “one-size-fits-all” coherence protocol that dynamically adapts to work well for small systems, large systems, and anywhere in between.