The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
IEEE Micro
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence
Proceedings of the 32nd annual international symposium on Computer Architecture
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
CATS: cycle accurate transaction-driven simulation with multiple processor simulators
Proceedings of the conference on Design, automation and test in Europe
Broadcast filtering-aware task assignment techniques for low-power MPSoCs
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Hi-index | 0.00 |
In shared bus-based multiprocessor system-on-a-chips (MPSoCs), snoop-based schemes are widely used to maintain cache coherency. However, many of broadcasts are useless because remote caches seldom have the matching blocks and their tag lookups do not supply data. From the energy perspective, such tag lookups consume unnecessary energy and make the system energy wasteful. In this paper, we propose a broadcast filtering technique to reduce snoop-energy in both of cache and bus. Broadcast filtering is achieved by help of snooping cache and split-bus. The snooping cache checks if matching blocks exist in remote caches before broad casting a coherency request. If no remote cache has the matching block, it eliminates the broadcast. If broadcasting is necessary, only a part of split-bus is used so that the request is selectively broadcasted only to the remote caches that have matching blocks. Simulation results show that our technique reduces 90%, 50%, and 30% of cache lookups, bus usage, and snoop-energy, respectively, with only 2% of degradation in performance. Our technique reduces more energy than other state-of-the-art techniques.