Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Load execution latency reduction
ICS '98 Proceedings of the 12th international conference on Supercomputing
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Sequential Hardware Prefetching in Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01
Hi-index | 0.00 |
Recent work in hybrid data address and value prediction has successfully increased the accuracy of data prefetching. However, many predictable data are still found to be missing from cache. Detail investigation showed that this is mainly due to two reasons: (i) partial cache hit for data being prefetched, and (ii) abortion of highly accurate prefetch requests by demand fetch requests. To improve this situation, we propose two mechanisms to reduce the startup latency of prefetch requests. They are the sequential unification of prefetch and demand requests and the aggressive lookahead mechanisms. The basic idea behind these two mechanisms is to combine accurate data prefetching with current demand fetching whenever the prefetch accuracy is expected to be high. Simulation of these two mechanisms on RPT (Reference Prediction Table - one of the most cited selective data prefetching schemes [2,3]) using SPEC95 showed that significant reduction in the data reference latency, ranging from a few percent to 60%, can be obtained. Furthermore, the additional hardware support for this scheme is very simple, thus making the mechanisms attractive for practical cache implementation.