Cache coherence techniques for multicore processors

  • Authors:
  • Mark D. Hill;Michael R. Marty

  • Affiliations:
  • The University of Wisconsin - Madison;The University of Wisconsin - Madison

  • Venue:
  • Cache coherence techniques for multicore processors
  • Year:
  • 2008

Quantified Score

Hi-index 0.02

Visualization

Abstract

The cache coherence mechanisms are a key component towards achieving the goal of continuing exponential performance growth through widespread thread-level parallelism. This dissertation makes several contributions in the space of cache coherence for multicore chips. First, we recognize that rings are emerging as a preferred on-chip interconnect. Unfortunately a ring does not preserve the total order provided by a bus. We contribute a new cache coherence protocol that exploits a ring's natural round-robin order. In doing so, we show how our new protocol achieves both fast performance and performance stability—a combination not found in prior designs. Second, we explore cache coherence protocols for systems constructed with several multicore chips. In these Multiple-CMP systems, coherence must occur both within a multicore chip and among multicore chips. Applying hierarchical coherence protocols greatly increases complexity, especially when a bus is not relied upon for the first-level of coherence. We first contribute a hierarchical coherence protocol, DirectoryCMP, that uses two directory-based protocols bridged together to create a highly scalable system. We then contribute TokenCMP, which extends token coherence, to create a Multiple-CMP system that is flat for correctness yet hierarchical for performance. We qualitatively argue how TokenCMP reduces complexity and our simulation results demonstrate comparable or better performance than DirectoryCMP. Third, we contribute the idea of virtual hierarchies for designing memory systems optimized for space sharing. With future chips containing abundant cores, the opportunities for space sharing the vast resources will only increase. Our contribution targets consolidated server workloads on a tiled multicore chip. We first show how existing flat coherence protocols fail to accomplish the memory system goals we identify. Then, we impose a two-level virtual coherence and caching hierarchy on a physically flat multicore that harmonizes with workload assignment. In doing so, we improve performance by exploiting the locality of space sharing, we provide performance isolation between workloads, and we maintain globally shared memory to support advanced virtualization features such as dynamic partitioning and content-based page sharing.