An effective programmable prefetch engine for on-chip caches
Proceedings of the 28th annual international symposium on Microarchitecture
Sunder: a programmable hardware prefetch architecture for numerical loops
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Performance of the decoupled ACRI-1 architecture: the perfect club
HPCN Europe '95 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Characterizing and removing branch mispredictions
Characterizing and removing branch mispredictions
Compiler techniques for evaluating and extending decoupled architectures (data prefetching)
Compiler techniques for evaluating and extending decoupled architectures (data prefetching)
Hi-index | 0.00 |
Decoupled processing seeks to dynamically schedule memory accesses in order to tolerate memory latency by prefetching operands. Since decoupled processors can not speculatively issue memory operations, control flow operations can significantly impact their ability to prefetch data. The prefetching architecture proposed here seeks to leverage the dynamic scheduling benefits of decoupled processing while allowing memory operations to be speculatively invoked. The prefetching mechanism is evaluated using the SPEC95 suite of benchmarks and significant reductions in cache miss rate are achieved, resulting in speedups of over 40% of peak for most of the inputs.