IEEE Transactions on Computers
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
A performance study of software and hardware data prefetching schemes
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Data prefetching for high-performance processors
Data prefetching for high-performance processors
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Computing Surveys (CSUR)
IEEE Micro
Access ordering and memory-conscious cache utilization
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Distributed Prefetch-buffer/Cache Design for High Performance Memory Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Performance Evaluation of Cache Prefetching Strategies
Performance Evaluation of Cache Prefetching Strategies
A distributed predictive cache for high performance computer systems
A distributed predictive cache for high performance computer systems
A PAB-based multi-prefetcher mechanism
International Journal of Parallel Programming
Victim management in a cache hierarchy
IBM Journal of Research and Development - Advanced silicon technology
Hi-index | 0.00 |
This paper describes and evaluates DRAM-page based cache-line prediction and prefetching architecture. The scheme takes DRAM access timing into consideration in order to reduce prefetching overhead, amortizing the high cost of DRAM access by fetching two cache lines that reside on the same DRAM-page in a single access. On each DRAM access, one or two cache blocks may be prefetched. We combine three prediction mechanisms: history mechanism, stride, and one block lookahead, make them DRAM page sensitive and deploy them in an effective adaptive prefetching strategy. Our simulation shows that the prefetch mechanism can greatly improve system performance. Using a 32-KB prediction table cache, the prefetching scheme improves performance by 26%-55% on average over a baseline configuration, depending on the memory model. Moreover, the simulation shows that prefetching is more cost-effective than simply increasing L2-cache size or using a one block lookahead prefetching scheme. Simulation results also show that DRAM-page based prefetching yields higher relative performance as processors get faster, making the prefetching scheme more attractive for next generation processors.