Sequential Unification and Aggressive Lookahead Mechanisms for Data Memory Accesses

Authors:
Chi-Hung Chi;Jun-Li Yuan
Affiliations:
-;-
Venue:
PaCT '999 Proceedings of the 5th International Conference on Parallel Computing Technologies
Year:
1999

Citing 5
Cited 0

Reducing memory latency via non-blocking and prefetching caches

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Load execution latency reduction

ICS '98 Proceedings of the 12th international conference on Supercomputing
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
Sequential Hardware Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors

ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent work in hybrid data address and value prediction has successfully increased the accuracy of data prefetching. However, many predictable data are still found to be missing from cache. Detail investigation showed that this is mainly due to two reasons: (i) partial cache hit for data being prefetched, and (ii) abortion of highly accurate prefetch requests by demand fetch requests. To improve this situation, we propose two mechanisms to reduce the startup latency of prefetch requests. They are the sequential unification of prefetch and demand requests and the aggressive lookahead mechanisms. The basic idea behind these two mechanisms is to combine accurate data prefetching with current demand fetching whenever the prefetch accuracy is expected to be high. Simulation of these two mechanisms on RPT (Reference Prediction Table - one of the most cited selective data prefetching schemes [2,3]) using SPEC95 showed that significant reduction in the data reference latency, ranging from a few percent to 60%, can be obtained. Furthermore, the additional hardware support for this scheme is very simple, thus making the mechanisms attractive for practical cache implementation.