Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Stride directed prefetching in scalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Stride-directed Prefetching for Secondary Caches
ICPP '97 Proceedings of the international Conference on Parallel Processing
AC/DC: An Adaptive Data Cache Prefetcher
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Data Cache Prefetching Using a Global History Buffer
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Implementing Caches in a 3D Technology for High Performance Processors
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Linux physical memory analysis
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Hi-index | 0.00 |
Data cache prefetching in the L2 is at the forefront of pre-fetching research. In this paper we analyze the impact of virtual page boundaries on these prefetchers. Conservative measurements on real hardware show that 30-50% of consecutive virtual pages are mapped to pages which are not consecutive in physical memory. Advanced hardware prefetching techniques that detect access patterns which span virtual page boundaries often end up prefetching data that is from the wrong physical page. Meanwhile, current simulation techniques for evaluating prefetching algorithms assume that all virtual pages are mapped consecutively. We show that not accounting for virtual page boundaries in simulation can lead to overestimates of as much as 29% (9% on average). We also show that a simple prefetch filter can improve performance up to 32% (7% on average) and recover the overestimated performance. This leads to the conclusion that although previous simulations may not have accounted for virtual page boundaries, the results they demonstrate are still attainable and that it is not necessary to simulate virtual page boundaries to get accurate results. However, actual hardware designers should take care to implement a simple filter or else their hardware may not show the same gains in performance as they did in simulation.