Predicting Coherence Communication by Tracking Synchronization Points at Run Time

Authors:
Socrates Demetriades;Sangyeun Cho
Affiliations:
-;-
Venue:
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2012

Citing 42
Cited 0

Implementation and performance of Munin

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Adaptive cache coherency for detecting migratory shared data

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Data forwarding in scalable shared-memory multiprocessors

ICS '95 Proceedings of the 9th international conference on Supercomputing
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Using prediction to accelerate coherence protocols

Proceedings of the 25th annual international symposium on Computer architecture
Memory sharing predictor: the key to a speculative coherent DSM

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Multicast snooping: a new coherence method using a multicast address network

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Selective, accurate, and timely self-invalidation using last-touch prediction

Proceedings of the 27th annual international symposium on Computer architecture
TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors

Proceedings of the 2002 international symposium on Low power electronics and design
Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol

IEEE Transactions on Parallel and Distributed Systems
Simics: A Full System Simulation Platform

Computer
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
An Evaluation of Fine-Grain Producer-Initiated Communication in Cache-Coherent Multiprocessors

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Improving CC-NUMA Performance Using Instruction-Based Prediction

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
The Coherence Predictor Cache: A Resource-Efficient and Accurate Coherence Prediction Infrastructure

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Token coherence: decoupling performance and correctness

Proceedings of the 30th annual international symposium on Computer architecture
Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors

Proceedings of the 30th annual international symposium on Computer architecture
(R) The Impact of Speeding up Critical Sections with Data Prefetching and Forwarding

ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3
Memory coherence activity prediction in commercial workloads

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence

Proceedings of the 32nd annual international symposium on Computer Architecture
In-Network Cache Coherence

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proximity-aware directory-based coherence for multi-core processor architectures

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Circuit-Switched Coherence

NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Improving support for locality and fine-grain sharing in chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Accelerating critical section execution with asymmetric multi-core architectures

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
An energy and performance exploration of network-on-chip architectures

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A communication characterisation of Splash-2 and Parsec

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Proximity coherence for chip multiprocessors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A Direct Coherence Protocol for Many-Core Chip Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Dark silicon and the end of multicore scaling

Proceedings of the 38th annual international symposium on Computer architecture
BarrierWatch: characterizing multithreaded workloads across and within program-defined epochs

Proceedings of the 8th ACM International Conference on Computing Frontiers
Phase-Based Application-Driven Hierarchical Power Management on the Single-chip Cloud Computer

PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Bottleneck identification and scheduling in multithreaded applications

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Parallel application memory scheduling

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Predicting target processors that a coherence request must be delivered to can improve the miss handling latency in shared memory systems. In directory coherence protocols, directly communicating with the predicted processors avoids costly indirection to the directory. In snooping protocols, prediction relaxes the high bandwidth requirements by replacing broadcast with multicast. In this work, we propose a new run-time coherence target prediction scheme that exploits the inherent correlation between synchronization points in a program and coherence communication. Our workload-driven analysis shows that by exposing synchronization points to hardware and tracking them at run time, we can simply and effectively track stable and repetitive communication patterns. Based on this observation, we build a predictor that can improve the miss latency of a directory protocol by 13%. Compared with existing address- and instruction-based prediction techniques, our predictor achieves comparable performance using substantially smaller power and storage overheads.