Improving Multiple-CMP Systems Using Token Coherence

Authors:
Michael R. Marty;Jesse D. Bingham;Mark D. Hill;Alan J. Hu;Milo M. K. Martin;David A. Wood
Affiliations:
University of Wisconsin-Madison;University of British Columbia;University of Wisconsin-Madison;University of British Columbia;University of Pennsylvania;University of Wisconsin-Madison
Venue:
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Year:
2005

Citing 0
Cited 34

Formal Verification and its Impact on the Snooping versus Directory Protocol Debate

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Cooperative Caching for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Interconnect-Aware Coherence Protocols for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Dense Gaussian networks: suitable topologies for on-chip multiprocessors

International Journal of Parallel Programming
A regulated transitive reduction (RTR) for longer memory race recording

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Coherence Ordering for Ring-based Chip Multiprocessors

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ASR: Adaptive Selective Replication for CMP Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual hierarchies to support server consolidation

Proceedings of the 34th annual international symposium on Computer architecture
SP-NUCA: a cost effective dynamic non-uniform cache architecture

ACM SIGARCH Computer Architecture News
Scalable and reliable communication for hardware transactional memory

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Token tenure: PATCHing token counting using directory-based cache coherence

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Push-assisted migration of real-time tasks in multi-core processors

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Dealing with Traffic-Area Trade-Off in Direct Coherence Protocols for Many-Core CMPs

APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Multicore power management: ensuring robustness via early-stage formal verification

MEMOCODE'09 Proceedings of the 7th IEEE/ACM international conference on Formal Methods and Models for Codesign
Efficient methods for formally verifying safety properties of hierarchical cache coherence protocols

Formal Methods in System Design
Exploit temporal locality of shared data in SRC enabled CMP

NPC'07 Proceedings of the 2007 IFIP international conference on Network and parallel computing
Fault-tolerant cache coherence protocols for CMPs: evaluation and trade-offs

HiPC'08 Proceedings of the 15th international conference on High performance computing
Token tenure and PATCH: A predictive/adaptive token-counting hybrid

ACM Transactions on Architecture and Code Optimization (TACO)
An adaptive cache coherence protocol for chip multiprocessors

Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Fractal Coherence: Scalably Verifiable Cache Coherence

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A composite and scalable cache coherence protocol for large scale CMPs

Proceedings of the international conference on Supercomputing
The gem5 simulator

ACM SIGARCH Computer Architecture News
Switch-based packing technique to reduce traffic and latency in token coherence

Journal of Parallel and Distributed Computing
Manager-client pairing: a framework for implementing coherence hierarchies

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
An optimized multicore cache coherence design for exploiting communication locality

Proceedings of the great lakes symposium on VLSI
Improving coherence protocol reactiveness by trading bandwidth for latency

Proceedings of the 9th conference on Computing Frontiers
A survey of checker architectures

ACM Computing Surveys (CSUR)
Using in-flight chains to build a scalable cache coherence protocol

ACM Transactions on Architecture and Code Optimization (TACO)
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
DP&TB: a coherence filtering protocol for many-core chip multiprocessors

The Journal of Supercomputing
Bandwidth Adaptive Cache Coherence Optimizations for Chip Multiprocessors

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Improvements in semiconductor technology now enable Chip Multiprocessors (CMPs). Asmany future computer systems will use one or more CMPs and support shared memory, such systems will have caches that must be kept coherent. Coherence is a particular challenge for Multiple-CMP (M-CMP) systems. One approach is to use a hierarchical protocol that explicitly separates the intra-CMP coherence protocol from the inter-CMP protocol, but couples them hierarchically to maintain coherence. However, hierarchical protocols are complex, leading to subtle, difficult-to-verify race conditions. Furthermore, most previous hierarchical protocols use directories at one or both levels, incurring indirections-and thus extra latency-for sharing misses, which are common in commercial workloads. In contrast, this paper exploits the separation of correctness substrate and performance policy in the recently-proposed token coherence protocol to develop the first M-CMP coherence protocol that is flat for correctness, but hierarchical for performance. Via model checking studies, we show that flat correctness eases verification. Via simulation with micro-benchmarks, we make new protocol variants more robust under contention. Finally, via simulation with commercial workloads on a commercial operating system, we show that new protocol variants can be 10-50% faster than a hierarchical directory protocol.