Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Profetching and memory system behavior of the SPEC95 benchmark suite
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Proceedings of the 27th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Sequential Hardware Prefetching in Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Memory Prefetching Using Adaptive Stream Detection
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01
Effective Management of DRAM Bandwidth in Multicore Processors
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Prefetch-Aware DRAM Controllers
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Coordinated control of multiple prefetchers in multi-core systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Prefetching is a highly effective latency hiding technique that can greatly improve application performance. However, aggressive prefetching can potentially stress the off-chip bandwidth. The resulting bandwidth stalls can potentially negate the performance gain due to prefetching. In this paper, focusing on a multicore environment, we first study the comparative benefits of hardware and software prefetching and analyze if the two are complimentary or redundant. This analysis also evaluates different aggressiveness levels of hardware prefetching. Secondly, we weigh the positive performance benefits of prefetching against the negative performance effects of bandwidth stalls. Thirdly, we propose a hierarchical prefetch management scheme for multicores that controls the prefetch levels such that the overall performance gain is improved. Lastly, we show that our proposed off-chip bandwidth aware prefetch management scheme is very effective in practice, leading to performance gains of upto about 10% in system throughput over a bandwidth agnostic prefetching scheme.