Bloom filtering cache misses for accurate data speculation and prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors
Proceedings of the 2002 international symposium on Low power electronics and design
Token coherence: decoupling performance and correctness
Proceedings of the 30th annual international symposium on Computer architecture
Tuning data replication for improving behavior of MPSoC applications
Proceedings of the 14th ACM Great Lakes symposium on VLSI
AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based Invariants
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence
Proceedings of the 32nd annual international symposium on Computer Architecture
Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking
Proceedings of the 32nd annual international symposium on Computer Architecture
Power-performance considerations of parallel computing on chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Cache coherence tradeoffs in shared-memory MPSoCs
ACM Transactions on Embedded Computing Systems (TECS)
L-CBF: a low-power, fast counting bloom filter architecture
Proceedings of the 2006 international symposium on Low power electronics and design
Scalable Cache Miss Handling for High Memory-Level Parallelism
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Reducing snoop-energy in shared bus-based mpsocs by filtering useless broadcasts
Proceedings of the 17th ACM Great Lakes symposium on VLSI
Proceedings of the 4th international conference on Computing frontiers
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Application-aware snoop filtering for low-power cache coherence in embedded multiprocessors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
The impact of wrong-path memory references in cache-coherent multiprocessor systems
Journal of Parallel and Distributed Computing
Improving the accuracy of snoop filtering using stream registers
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Broadcast filtering-aware task assignment techniques for low-power MPSoCs
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Exploiting access semantics and program behavior to reduce snoop power in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
SoftSig: software-exposed hardware signatures for code analysis and optimization
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Using supplier locality in power-aware interconnects and caches in chip multiprocessors
Journal of Systems Architecture: the EUROMICRO Journal
Proceedings of the 45th annual Design Automation Conference
Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors
Proceedings of the 13th international symposium on Low power electronics and design
L-CBF: a low-power, fast counting bloom filter architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Broadcast filtering: Snoop energy reduction in shared bus-based low-power MPSoCs
Journal of Systems Architecture: the EUROMICRO Journal
Low-power inter-core communication through cache partitioning in embedded multiprocessors
Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design: Chip on the Dunes
In-network coherence filtering: snoopy coherence without broadcasts
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
TurboTag: lookup filtering to reduce coherence directory power
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
WAYPOINT: scaling coherence to thousand-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Subspace snooping: filtering snoops with operating system support
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
ACM Transactions on Design Automation of Electronic Systems (TODAES)
SigNet: network-on-chip filtering for coarse vector directories
Proceedings of the Conference on Design, Automation and Test in Europe
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Virtual Snooping: Filtering Snoops in Virtualized Multi-cores
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
A workload-adaptive and reconfigurable bus architecture for multicore processors
International Journal of Reconfigurable Computing
Exploring the architecture of a stream register-based snoop filter
Transactions on high-performance embedded architectures and compilers III
Energy-efficient hardware data prefetching
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A composite and scalable cache coherence protocol for large scale CMPs
Proceedings of the international conference on Supercomputing
A 98 GMACs/W 32-core vector processor in 65nm CMOS
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Filtering directory lookups in CMPs with write-through caches
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Dynamic, multi-core cache coherence architecture for power-sensitive mobile processors
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Filtering directory lookups in CMPs
Microprocessors & Microsystems
Memory subsystem characterization in a 16-core snoop-based chip-multiprocessor architecture
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Switch-based packing technique to reduce traffic and latency in token coherence
Journal of Parallel and Distributed Computing
Efficiently enabling conventional block sizes for very large die-stacked DRAM caches
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Using partial tag comparison in low-power snoop-based chip multiprocessors
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Complexity-effective multicore coherence
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
A survey and taxonomy of on-chip monitoring of multicore systems-on-chip
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A dual grain hit-miss detector for large die-stacked DRAM caches
Proceedings of the Conference on Design, Automation and Test in Europe
VGTS: variable granularity transactional snoop
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Heterogeneous system coherence for integrated CPU-GPU systems
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Abstract: We propose methods for reducing the energy consumed by snoop requests in snoopy bus-based symmetric multiprocessor (SMP)systems. Observing that a large fraction of snoops do not find copies in many of the other caches, we introduce JETTY, a small, cache-like structure. A JETTY is introduced in-between the bus and the L2 backside of each processor. There it filters the vast majority of snoops that would not find a locally cached copy. Energy is reduced as accesses to the much more energy demanding L2 tag arrays are decreased. No changes in the existing coherence protocol are required and no performance loss is experienced. We evaluate our method on a 4-way SMP server using a set of shared-memory applications. We demonstrate that a very small JETTY filters 74%(average)of all snoop-induced tag accesses that would miss. This results in an average energy reduction of 29% (range:12%to 40%) measured as a fraction of the energy required by all L2 accesses (both tag and data arrays).