An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing

  • Authors:
  • Liqun Cheng;John B. Carter;Donglai Dai

  • Affiliations:
  • University of Utah, legion@cs.utah.edu, retrac@cs.utah.edu;University of Utah, legion@cs.utah.edu, retrac@cs.utah.edu;Silicon Graphics, Inc. dai@sgi.com

  • Venue:
  • HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Shared memory multiprocessors play an increasingly important role in enterprise and scientific computing facilities. Remote misses limit the performance ofshared memory applications, and their significance is growing as network latency increases relative to processor speeds. This paper proposes two mechanisms that improve shared memory performance by eliminating remote misses and/or reducing the amount of communication required to maintain coherence. We focus on improving the performance of applications that exhibit producer-consumer sharing. We first present a simple hardware mechanism for detecting producer-consumer sharing. We then describe a directory delegation mechanism whereby the "home node" of a cache line can be delegated to a producer node, thereby converting 3-hop coherence operations into 2-hop operations. We then extend the delegation mechanism to support speculative updates for data accessed in a producer-consumer pattern, which can convert 2-hop misses into local misses, thereby eliminating the remote memory latency. Both mechanisms can be implemented without changes to the processor We evaluate our directory delegation and speculative update mechanisms on seven benchmark programs that exhibit producer-consumer sharing using a cycle-accurate execution-driven simulator of a future 16-node SGI multiprocessor We find that the mechanisms proposed in this paper reduce the av average remote miss rate by 40%, reduce network traffic by 15%, and improve performance by 21%. Finally, we use Murphi to verify that each mechanism is error-free and does not violate sequential consistency.