Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Data prefetching for high-performance processors
Data prefetching for high-performance processors
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Speculative execution via address prediction and data prefetching
ICS '97 Proceedings of the 11th international conference on Supercomputing
The predictability of data values
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Load execution latency reduction
ICS '98 Proceedings of the 12th international conference on Supercomputing
Characterization and improvement of load/store cache-based prefetching
ICS '98 Proceedings of the 12th international conference on Supercomputing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
A New Voting Based Hardware Data Prefetch Scheme
HIPC '97 Proceedings of the Fourth International Conference on High-Performance Computing
Hi-index | 0.00 |
In this paper, we propose a set of four load-balancing techniques to address the memory latency problem of on-chip cache. The first two mechanisms, the sequential unification and the aggressive lookahead mechanisms, are mainly used to reduce the chance of partial hits and the abortion of accurate prefetch requests. The latter two mechanisms, the default prefetching and the cache partitioning mechanisms, are used to optimize the cache performance of the unpredictable references. The resulting cache, called the LBD (Load-Balancing Data) cache, is found to have superior performance over a wide range of applications. Simulation of the LBD cache with RPT prefetching (Reference Prediction Table - one of the most cited selective data prefetch schemes [2,3]) on SPEC95 showed that significant reduction in the data reference latency, ranging from about 20% to over 90% and with an average of 55.89%, can be obtained. This is compared against the performance of prefetch-on-miss and RPT, with an average latency reduction of only 17.37% and 26.05% respectively.