Coherency for multiprocessor virtual address caches
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Organization and performance of a two-level virtual-real cache hierarchy
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Page placement algorithms for large real-indexed caches
ACM Transactions on Computer Systems (TOCS)
Way-predicting set-associative cache for high performance and low energy consumption
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Summary cache: a scalable wide-area web cache sharing protocol
IEEE/ACM Transactions on Networking (TON)
ACM Computing Surveys (CSUR)
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Bloom filtering cache misses for accurate data speculation and prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
U-cache: a cost-effective solution to synonym problem
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Just Say No: Benefits of Early Cache Miss Determination
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Dynamic Optimization of Micro-Operations
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
TAXI: Trace Analysis for X86 Interpretation
ICCD '02 Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02)
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Space-code bloom filter for efficient traffic flow measurement
Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement
Scalable Hardware Memory Disambiguation for High ILP Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Checkpoint Processing and Recovery: Towards Scalable Large Instruction Window Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Fetch Halting on Critical Load Misses
ICCD '04 Proceedings of the IEEE International Conference on Computer Design
Store Vulnerability Window (SVW): Re-Execution Filtering for Enhanced Load Optimization
Proceedings of the 32nd annual international symposium on Computer Architecture
Efficient system-on-chip energy management with a segmented bloom filter
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Heterogeneously tagged caches for low-power embedded systems with virtual memory support
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Way guard: a segmented counting bloom filter approach to reducing energy for set-associative caches
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
TurboTag: lookup filtering to reduce coherence directory power
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Reducing memory reference energy with opportunistic virtual caching
Proceedings of the 39th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
Virtual caches are employed as L1 caches of both high performance and embedded processors to meet their short latency requirements. However, they also introduce the synonym problem where the same physical cache line can be present at multiple locations in the cache due to their distinct virtual addresses, leading to potential data consistency issues. To guarantee correctness, common hardware solutions either perform serial lookups for all possible synonym locations in the L1 consuming additional energy or employ a reverse map in the L2 cache that incurs a large area overhead. Such preventive mechanisms are nevertheless indispensable even though synonyms may not always be present during the execution.In this paper, we study the synonym issue using Windows applications workload and propose a technique based on Bloom filters to reduce synonym lookup energy. By tracking the address stream using Bloom filters, we can confidently exclude the addresses that were never observed to eliminate unnecessary synonym lookups, thereby saving energy in the L1 cache. Bloom filters have a very small area overhead making our implementation a feasible and attractive solution for synonym detection. Our results show that synonyms in these applications actually constitutes less than 0.1% of the total cache misses. By applying our technique, the dynamic energy consumed in L1 data cache can be reduced up to 32.5%. When taking leakage energy into account, the savings is up to 27.6%.