Inexpensive implementations of set-associativity
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Cache designs with partial address matching
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Memory dependence prediction using store sets
Proceedings of the 25th annual international symposium on Computer architecture
Speculation techniques for improving load related instruction scheduling
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Tuning the Pentium Pro Microarchitecture
IEEE Micro
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
The Alpha 21264 Microprocessor
IEEE Micro
Recovery Mechanism for Latency Misprediction
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Predictive sequential associative cache
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
JETTY: Filtering Snoops for Reduced Energy Consumption in SMP Servers
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Scalable Hardware Memory Disambiguation for High ILP Processors
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Design and Optimization of Large Size and Low Overhead Off-Chip Caches
IEEE Transactions on Computers
AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based Invariants
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Effects of speculation on performance and issue queue design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
L-CBF: a low-power, fast counting bloom filter architecture
Proceedings of the 2006 international symposium on Low power electronics and design
Reducing energy of virtual cache synonym lookup using bloom filters
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
A comparison of two policies for issuing instructions speculatively
Journal of Systems Architecture: the EUROMICRO Journal
Reducing non-deterministic loads in low-power caches via early cache set resolution
Microprocessors & Microsystems
SoftSig: software-exposed hardware signatures for code analysis and optimization
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Thrifty BTB: A comprehensive solution for dynamic power reduction in branch target buffers
Microprocessors & Microsystems
L-CBF: a low-power, fast counting bloom filter architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Recruiting Decay for Dynamic Power Reduction in Set-Associative Caches
Transactions on High-Performance Embedded Architectures and Compilers II
Way guard: a segmented counting bloom filter approach to reducing energy for set-associative caches
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
The design of a bloom filter hardware accelerator for ultra low power systems
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Applying decay to reduce dynamic power in set-associative caches
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
TurboTag: lookup filtering to reduce coherence directory power
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
An Adaptive Data Prefetcher for High-Performance Processors
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
A Generalized Bloom Filter to Secure Distributed Network Applications
Computer Networks: The International Journal of Computer and Telecommunications Networking
Efficient system-on-chip energy management with a segmented bloom filter
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
PACMan: prefetch-aware cache management for high performance caching
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
The evicted-address filter: a unified mechanism to address both cache pollution and thrashing
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
A processor must know a load instruction's latency to schedule the load's dependent instructions at the correct time. Unfortunately, modern processors do not know this latency until well after the dependent instructions should have been scheduled to avoid pipeline bubbles between themselves and the load. One solution to this problem is to predict the load's latency, by predicting whether the load will hit or miss in the data cache. Existing cache hit/miss predictors, however, can only correctly predict about 50% of cache misses.This paper introduces a new hit/miss predictor that uses a Bloom Filter to identify cache misses early in the pipeline. This early identification of cache misses allows the processor to more accurately schedule instructions that are dependent on loads and to more precisely prefetch data into the cache. Simulations using a modified SimpleScalar model show that the proposed Bloom Filter is nearly perfect, with a prediction accuracy greater than 99% for the SPECint2000 benchmarks. IPC (Instructions Per Cycle) performance improved by 19% over a processor that delayed the scheduling of instructions dependent on a load until the load latency was known, and by 6% and 7% over a processor that always predicted a load would hit the cache and with a counter-based hit/miss predictor respectively. This IPC reaches 99.7% of the IPC of a processor with perfect scheduling.