Understanding some simple processor-performance limits
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Proceedings of the 27th annual international symposium on Computer architecture
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Effective Instruction Prefetching in Chip Multiprocessors for Modern Commercial Applications
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
POWER5 System microarchitecture
IBM Journal of Research and Development - POWER5 and packaging
Reducing Cache Pollution via Dynamic Data Prefetch Filtering
IEEE Transactions on Computers
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Effective Management of DRAM Bandwidth in Multicore Processors
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
IBM Journal of Research and Development
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Characterizing and modeling the behavior of context switch misses
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Prefetch-Aware DRAM Controllers
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Scaling the bandwidth wall: challenges in and avenues for CMP scaling
Proceedings of the 36th annual international symposium on Computer architecture
Architecture Support for Improving Bulk Memory Copying and Initialization Performance
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Coordinated control of multiple prefetchers in multi-core systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Understanding the behavior and implications of context switch misses
ACM Transactions on Architecture and Code Optimization (TACO)
Architectural framework for supporting operating system survivability
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
ACCESS: Smart scheduling for asymmetric cache CMPs
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
SRP: symbiotic resource partitioning of the memory hierarchy in CMPs
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Proceedings of the 2nd ACM Symposium on Cloud Computing
Making data prefetch smarter: adaptive prefetching on POWER7
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
A performance-aware quality of service-driven scheduler for multicore processors
ACM SIGBED Review - Special Issue on the 3rd Embedded Operating System Workshop (EWiLi 2013)
Hi-index | 0.00 |
Modern high performance microprocessors widely employ hardware prefetching technique to hide long memory access latency. While very useful, hardware prefetching tends to aggravate the bandwidth wall, a problem where system performance is increasingly limited by the availability of the off-chip pin bandwidth in Chip Multi-Processors (CMPs). In this paper, we propose an analytical model-based study to investigate how hardware prefetching and memory bandwidth partitioning impact CMP system performance and how they interact. The model includes a composite prefetching metric that can help determine under which conditions prefetching can improve system performance, a bandwidth partitioning model that takes into account prefetching effects, and a derivation of the weighted speedup optimum bandwidth partition sizes for different cores. Through model-driven case studies, we find several interesting observations that can be valuable for future CMP system design and optimization. We also explore simulation-based empirical evaluation to validate the observations and show that maximum system performance can be achieved by selective prefetching, guided by the composite prefetching metric, coupled with dynamic bandwidth partitioning.