Race-free interconnection networks and multiprocessor consistency
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Using prediction to accelerate coherence protocols
Proceedings of the 25th annual international symposium on Computer architecture
Memory sharing predictor: the key to a speculative coherent DSM
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Multicast snooping: a new coherence method using a multicast address network
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
The sun fireplane system interconnect
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Alpha 21364 Network Architecture
IEEE Micro
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Orion: a power-performance simulator for interconnection networks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Using hints to reduce the read miss penalty for flat COMA protocols
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Using Switch Directories to Speed Up Cache-to-Cache Transfers in CC-NUMA Multiprocessors
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Proceedings of the 30th annual international symposium on Computer architecture
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
A unified theory of shared memory consistency
Journal of the ACM (JACM)
Coherence decoupling: making use of incoherence
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
In this work we reduce interconnect power dissipation in Symmetric Multiprocessors or SMPs. We revisit snoopy cache coherence protocols and reduce unnecessary interconnect activity by speculating nodes expected to provide a missing data. Conventional snoopy cache coherence protocols broadcast requests to all nodes, reducing the latency of cache to cache transfer misses at the expense of increasing interconnect power. We show that it is possible to reduce the associated power dissipation if such requests are broadcasted selectively and only to nodes more likely to provide the missing data. We reduce power as we limit access only to the interconnect components between the requester and the supplier node. We evaluate our technique using shared memory applications and show that it is possible to reduce interconnect power by 21% in a 4-way multiprocessor without compromising performance. This comes with negligible hardware overhead.