An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Prefetch unit for vector operations on scalar computers (abstract)
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Stride directed prefetching in scalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Adaptive data prefetching using cache information
ICS '97 Proceedings of the 11th international conference on Supercomputing
CPU Cache Prefetching: Timing Evaluation of Hardware Implementations
IEEE Transactions on Computers
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Architectural and compiler support for effective instruction prefetching: a cooperative approach
ACM Transactions on Computer Systems (TOCS)
Cache performance for selected SPEC CPU2000 benchmarks
ACM SIGARCH Computer Architecture News
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Run-Time Adaptive Cache Management
HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences-Volume 7 - Volume 7
DRAM-Page Based Prediction and Prefetching
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Memory-Side Prefetching for Linked Data Structures
Memory-Side Prefetching for Linked Data Structures
Reducing Cache Pollution of Prefetching in a Small Data Cache
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
A New Voting Based Hardware Data Prefetch Scheme
HIPC '97 Proceedings of the Fourth International Conference on High-Performance Computing
Toward kilo-instruction processors
ACM Transactions on Architecture and Code Optimization (TACO)
Coordinated control of multiple prefetchers in multi-core systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Resolving a L2-prefetch-caused parallel nonscaling on Intel Core microarchitecture
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Aggressive prefetching mechanisms improve performance of some important applications, but substantially increase bus traffic and "pressure" on cache tag arrays. They may even reduce performance of applications that are not memory bounded. We introduce a "feedback" mechanism, termed Prefetcher Assessment Buffer (PAB), which filters out requests that are unlikely to be useful. With this, applications that cannot benefit from aggressive prefetching will not suffer from their side-effects. The PAB is evaluated with different configurations, e.g., "all L1 accesses trigger prefetches" and "only misses to L1 trigger prefetches'. When compared with the non-selective concurrent use of multiple prefetchers, the PAB's application to prefetching from main memory to the L2 cache can reduce the number of loads from main memory by up to 25% without losing performance. Application of more sophisticated techniques to prefetches between the L2- and Ll-cache can increase IPC by 4% while reducing the traffic between the caches 8-fold.