Cache coherence techniques for multicore processors

Authors:
Mark D. Hill;Michael R. Marty
Affiliations:
The University of Wisconsin - Madison;The University of Wisconsin - Madison
Venue:
Cache coherence techniques for multicore processors
Year:
2008

Citing 0
Cited 6

Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Efficient methods for formally verifying safety properties of hierarchical cache coherence protocols

Formal Methods in System Design
Data marshaling for multi-core architectures

Proceedings of the 37th annual international symposium on Computer architecture
Hardware transactional memory: A high performance parallel programming model

Journal of Systems Architecture: the EUROMICRO Journal
Manager-client pairing: a framework for implementing coherence hierarchies

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Using in-flight chains to build a scalable cache coherence protocol

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.02

Visualization

Abstract

The cache coherence mechanisms are a key component towards achieving the goal of continuing exponential performance growth through widespread thread-level parallelism. This dissertation makes several contributions in the space of cache coherence for multicore chips. First, we recognize that rings are emerging as a preferred on-chip interconnect. Unfortunately a ring does not preserve the total order provided by a bus. We contribute a new cache coherence protocol that exploits a ring's natural round-robin order. In doing so, we show how our new protocol achieves both fast performance and performance stability—a combination not found in prior designs. Second, we explore cache coherence protocols for systems constructed with several multicore chips. In these Multiple-CMP systems, coherence must occur both within a multicore chip and among multicore chips. Applying hierarchical coherence protocols greatly increases complexity, especially when a bus is not relied upon for the first-level of coherence. We first contribute a hierarchical coherence protocol, DirectoryCMP, that uses two directory-based protocols bridged together to create a highly scalable system. We then contribute TokenCMP, which extends token coherence, to create a Multiple-CMP system that is flat for correctness yet hierarchical for performance. We qualitatively argue how TokenCMP reduces complexity and our simulation results demonstrate comparable or better performance than DirectoryCMP. Third, we contribute the idea of virtual hierarchies for designing memory systems optimized for space sharing. With future chips containing abundant cores, the opportunities for space sharing the vast resources will only increase. Our contribution targets consolidated server workloads on a tiled multicore chip. We first show how existing flat coherence protocols fail to accomplish the memory system goals we identify. Then, we impose a two-level virtual coherence and caching hierarchy on a physically flat multicore that harmonizes with workload assignment. In doing so, we improve performance by exploiting the locality of space sharing, we provide performance isolation between workloads, and we maintain globally shared memory to support advanced virtualization features such as dynamic partitioning and content-based page sharing.