An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Next cache line and set prediction
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Streamlining data cache access with fast address calculation
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Zero-cycle loads: microarchitecture support for reducing load latency
Proceedings of the 28th annual international symposium on Microarchitecture
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Speculative execution via address prediction and data prefetching
ICS '97 Proceedings of the 11th international conference on Supercomputing
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Power considerations in the design of the Alpha 21264 microprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
Reducing power in high-performance microprocessors
DAC '98 Proceedings of the 35th annual Design Automation Conference
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
Correlated load-address predictors
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Early load address resolution via register tracking
Proceedings of the 27th annual international symposium on Computer architecture
Reducing set-associative cache energy via way-prediction and selective direct-mapping
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The Alpha 21264 Microprocessor
IEEE Micro
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Predictive sequential associative cache
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Wire Delay is Not a Problem for SMT (In the Near Future)
Proceedings of the 31st annual international symposium on Computer architecture
On reducing load/store latencies of cache accesses
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
With the increasing performance gap between the processor and the memory, the importance of caches is increasing for high performance processors. However, with reducing feature sizes and increasing clock speeds, cache access latencies are increasing. Designers pipeline the cache accesses to prevent the increasing latencies from affecting the cache throughput. Nevertheless, increasing latencies can degrade the performance significantly by delaying the execution of dependent instructions.In this paper, we investigate predicting the data cache set and the tag of the memory address as a means to reduce the effective cache access latency. In this technique, the predicted set is used to start the pipelined cache access in parallel to the memory address computation. We also propose a set-address adaptive predictor to improve the prediction accuracy of the data cache sets. Our studies found that using set prediction to reduce load-to-use latency can improve the overall performance of the processor by as much as 24%. In this paper, we also investigate techniques, such as predicting the data cache line where the data will be present, to limit the increase in cache energy consumption when using set prediction. In fact, with line prediction, the techniques in this paper consume about 15% less energy in the data cache than a decoupled-accessed cache with minimum energy consumption, while still maintaining the performance improvement. However, the overall energy consumption is about 35% more than a decoupled-accessed cache when the energy consumption in the predictor table is also considered.