An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Limitations of cache prefetching on a bus-based multiprocessor
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
Profetching and memory system behavior of the SPEC95 benchmark suite
IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Proceedings of the 27th annual international symposium on Computer architecture
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
A stateless, content-directed data prefetching mechanism
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Genetic Algorithms and Machine Learning
Machine Learning
AC/DC: An Adaptive Data Cache Prefetcher
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
International Journal of Parallel Programming
A PAB-based multi-prefetcher mechanism
International Journal of Parallel Programming
Proceedings of the 34th annual international symposium on Computer architecture
QoS policies and architecture for cache/memory in CMP platforms
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Fixed and Adaptive Sequential Prefetching in Shared Memory Multiprocessors
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 01
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Prefetch-Aware DRAM Controllers
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Contention-Aware Scheduling on Multicore Systems
ACM Transactions on Computer Systems (TOCS)
Many-Thread Aware Prefetching Mechanisms for GPGPU Applications
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Resolving a L2-prefetch-caused parallel nonscaling on Intel Core microarchitecture
Journal of Parallel and Distributed Computing
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Prefetch-aware shared resource management for multi-core systems
Proceedings of the 38th annual international symposium on Computer architecture
Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Bandwidth constrained coordinated HW/SW prefetching for multicores
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Global-aware and multi-order context-based prefetching for high-performance processors
International Journal of High Performance Computing Applications
ABS: A low-cost adaptive controller for prefetching in a banked shared last-level cache
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
REEact: a customizable virtual execution manager for multicore platforms
VEE '12 Proceedings of the 8th ACM SIGPLAN/SIGOPS conference on Virtual Execution Environments
Parallel application memory scheduling
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
PACMan: prefetch-aware cache management for high performance caching
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
ACM Transactions on Computer Systems (TOCS)
Application-aware prefetch prioritization in on-chip networks
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
To hardware prefetch or not to prefetch?: a virtualized environment study and core binding approach
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Coordinating prefetching and STT-RAM based last-level cache management for multicore systems
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Orchestrated scheduling and prefetching for GPGPUs
Proceedings of the 40th Annual International Symposium on Computer Architecture
APOGEE: adaptive prefetching on GPUs for energy efficiency
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Meeting midway: improving CMP performance with memory-side prefetching
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
TCPT: thread criticality-driven prefetcher throttling
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Insertion and promotion for tree-based PseudoLRU last-level caches
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
Aggressive prefetching is very beneficial for memory latency tolerance of many applications. However, it faces significant challenges in multi-core systems. Prefetchers of different cores on a chip multiprocessor (CMP) can cause significant interference with prefetch and demand accesses of other cores. Because existing prefetcher throttling techniques do not address this prefetcher-caused inter-core interference, aggressive prefetching in multi-core systems can lead to significant performance degradation and wasted bandwidth consumption. To make prefetching effective in CMPs, this paper proposes a low-cost mechanism to control prefetcher-caused inter-core interference by dynamically adjusting the aggressiveness of multiple cores' prefetchers in a coordinated fashion. Our solution consists of a hierarchy of prefetcher aggressiveness control structures that combine per-core (local) and prefetcher-caused inter-core (global) interference feedback to maximize the benefits of prefetching on each core while optimizing overall system performance. These structures improve system performance by 23% while reducing bus traffic by 17% compared to employing aggressive prefetching and improve system performance by 14% compared to a state-of-the-art prefetcher aggressiveness control technique on an eight-core system.