Speculative supplier identification for reducing power of interconnects in snoopy cache coherence protocols

Authors:
Ehsan Atoofian;Amirali Baniasadi;Kaveh Aasaraai
Affiliations:
University of Victoria, Victoria, BC, Canada;University of Victoria, Victoria, BC, Canada;University of Victoria, Victoria, BC, Canada
Venue:
Proceedings of the 4th international conference on Computing frontiers
Year:
2007

Citing 21
Cited 0

Race-free interconnection networks and multiprocessor consistency

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
Using prediction to accelerate coherence protocols

Proceedings of the 25th annual international symposium on Computer architecture
Memory sharing predictor: the key to a speculative coherent DSM

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Multicast snooping: a new coherence method using a multicast address network

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach

Parallel Computer Architecture: A Hardware/Software Approach
The sun fireplane system interconnect

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Temporally silent stores

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Alpha 21364 Network Architecture

IEEE Micro
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Orion: a power-performance simulator for interconnection networks

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Using hints to reduce the read miss penalty for flat COMA protocols

HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Using Switch Directories to Speed Up Cache-to-Cache Transfers in CC-NUMA Multiprocessors

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors

Proceedings of the 30th annual international symposium on Computer architecture
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A unified theory of shared memory consistency

Journal of the ACM (JACM)
Coherence decoupling: making use of incoherence

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Coarse-Grain Coherence Tracking: RegionScout and Region Coherence Arrays

IEEE Micro

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work we reduce interconnect power dissipation in Symmetric Multiprocessors or SMPs. We revisit snoopy cache coherence protocols and reduce unnecessary interconnect activity by speculating nodes expected to provide a missing data. Conventional snoopy cache coherence protocols broadcast requests to all nodes, reducing the latency of cache to cache transfer misses at the expense of increasing interconnect power. We show that it is possible to reduce the associated power dissipation if such requests are broadcasted selectively and only to nodes more likely to provide the missing data. We reduce power as we limit access only to the interconnect components between the requester and the supplier node. We evaluate our technique using shared memory applications and show that it is possible to reduce interconnect power by 21% in a 4-way multiprocessor without compromising performance. This comes with negligible hardware overhead.