On the inclusion properties for multi-level cache hierarchies
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Region-based caching: an energy-delay efficient memory architecture for embedded processors
CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Slipstream processors: improving both performance and fault tolerance
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors
Proceedings of the 2002 international symposium on Low power electronics and design
The 82460GX Sever/Workstation Chip Set
IEEE Micro
Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling
IEEE Transactions on Parallel and Distributed Systems
Multiprocessor validation of the Pentium Pro microprocessor
COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
A Performance Comparison of Hierarchical Ring- and Mesh- Connected Multiprocessor Networks
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Transient-fault recovery for chip multiprocessors
Proceedings of the 30th annual international symposium on Computer architecture
Energy efficient D-TLB and data cache using semantic-aware multilateral partitioning
Proceedings of the 2003 international symposium on Low power electronics and design
Stack Value File: Custom Microarchitecture for the Stack
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence
Proceedings of the 32nd annual international symposium on Computer Architecture
Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking
Proceedings of the 32nd annual international symposium on Computer Architecture
Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization
Proceedings of the 32nd annual international symposium on Computer Architecture
An Integrated Framework for Dependable and Revivable Architectures Using Multicore Processors
Proceedings of the 33rd annual international symposium on Computer Architecture
DSD '06 Proceedings of the 9th EUROMICRO Conference on Digital System Design
Design tradeoffs for tiled CMP on-chip networks
Proceedings of the 20th annual international conference on Supercomputing
Late-binding: enabling unordered load-store queues
Proceedings of the 34th annual international symposium on Computer architecture
Efficient system-on-chip energy management with a segmented bloom filter
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Low-power inter-core communication through cache partitioning in embedded multiprocessors
Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design: Chip on the Dunes
Efficient program scheduling for heterogeneous multi-core processors
Proceedings of the 46th Annual Design Automation Conference
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing shared cache behavior of chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Low-power snoop architecture for synchronized producer-consumer embedded multiprocessing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
TurboTag: lookup filtering to reduce coherence directory power
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Stack filter: Reducing L1 data cache power consumption
Journal of Systems Architecture: the EUROMICRO Journal
Filtering directory lookups in CMPs with write-through caches
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Filtering directory lookups in CMPs
Microprocessors & Microsystems
Using partial tag comparison in low-power snoop-based chip multiprocessors
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Exploiting semantics of virtual memory to improve the efficiency of the on-chip memory system
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
VGTS: variable granularity transactional snoop
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
Integrating more processor cores on-die has become the unanimous trend in the microprocessor industry. Most of the current research thrusts using chip multiprocessors (CMPs) as the baseline to analyze problems in various domains. One of the main design issues facing CMP systems is the growing number of snoops required to maintain cache coherency and to support self/cross-modifying code that leads to power and performance limitations. In this paper, we analyze the internal and external snoop behavior in a CMP system and relax the snoopy cache coherence protocol based on the program semantics and properties of the shared variables for saving power. Based on the observations and analyses, we propose two novel techniques: Selective Snoop Probe (SSP) and Essential Snoop Probe (ESP) to reduce power without compromising performance. Our simulation results show that using the SSPtechnique, 5% to 65% data cache energy savings per core for different processor configurations can be achieved with 1% to 2% performance improvement. We also show that 5% to 82% of data cache energy per core is spent on the non-essential snoop probes that can be saved using the ESP technique.