High-bandwidth data memory systems for superscalar processors
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A model for estimating trace-sample miss ratios
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Cache miss heuristics and preloading techniques for general-purpose programs
Proceedings of the 28th annual international symposium on Microarchitecture
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Examination of a memory access classification scheme for pointer-intensive and numeric programs
ICS '96 Proceedings of the 10th international conference on Supercomputing
Dependence based prefetching for linked data structures
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Capturing dynamic memory reference behavior with adaptive cache topology
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Prefetching Using Markov Predictors
IEEE Transactions on Computers - Special issue on cache memory and related problems
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Selective, accurate, and timely self-invalidation using last-touch prediction
Proceedings of the 27th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Establishing a tight bound on task interference in embedded system instruction caches
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Profile-guided post-link stride prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
Using a user-level memory thread for correlation prefetching
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Timekeeping in the memory system: predicting and optimizing memory behavior
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
A Decoupled Predictor-Directed Stream Prefetching Architecture
IEEE Transactions on Computers
Using the Compiler to Improve Cache Replacement Decisions
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
TCP: Tag Correlating Prefetchers
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Guided region prefetching: a cooperative hardware/software approach
Proceedings of the 30th annual international symposium on Computer architecture
Correlation Prefetching with a User-Level Memory Thread
IEEE Transactions on Parallel and Distributed Systems
Performance, energy, and reliability tradeoffs in replicating hot cache lines
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Self-correcting LRU replacement policies
Proceedings of the 1st conference on Computing frontiers
Effective stream-based and execution-based data prefetching
Proceedings of the 18th annual international conference on Supercomputing
Implementing branch-predictor decay using quasi-static memory cells
ACM Transactions on Architecture and Code Optimization (TACO)
MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Memory coherence activity prediction in commercial workloads
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Memory predecryption: hiding the latency overhead of memory encryption
ACM SIGARCH Computer Architecture News - Special issue: Workshop on architectural support for security and anti-virus (WASSA)
Computing Architectural Vulnerability Factors for Address-Based Structures
Proceedings of the 32nd annual international symposium on Computer Architecture
Characterization of L3 cache behavior of SPECjAppServer2002 and TPC-C
Proceedings of the 19th annual international conference on Supercomputing
Store-Ordered Streaming of Shared Memory
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Counter-Based Cache Replacement Algorithms
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Spectral prefetcher: An effective mechanism for L2 cache prefetching
ACM Transactions on Architecture and Code Optimization (TACO)
Prefetching-aware cache line turnoff for saving leakage energy
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
International Journal of Parallel Programming
Proceedings of the 33rd annual international symposium on Computer Architecture
Program Counter-Based Prediction Techniques for Dynamic Power Management
IEEE Transactions on Computers
International Journal of Parallel Programming
Evaluating instruction cache vulnerability to transient errors
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Data prefetching in a cache hierarchy with high bandwidth and capacity
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Victim management in a cache hierarchy
IBM Journal of Research and Development - Advanced silicon technology
Future execution: A prefetching mechanism that uses multiple cores to speed up single threads
ACM Transactions on Architecture and Code Optimization (TACO)
Reducing Cache Pollution via Dynamic Data Prefetch Filtering
IEEE Transactions on Computers
Adaptive insertion policies for high performance caching
Proceedings of the 34th annual international symposium on Computer architecture
Program-counter-based pattern classification in buffer caching
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Evaluating instruction cache vulnerability to transient errors
ACM SIGARCH Computer Architecture News
Data prefetching in a cache hierarchy with high bandwidth and capacity
ACM SIGARCH Computer Architecture News
Proceedings of the 5th conference on Computing frontiers
Low-Cost Adaptive Data Prefetching
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Capturing and optimizing the interactions between prefetching and cache line turnoff
Microprocessors & Microsystems
Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Less reused filter: improving l2 cache performance via filtering less reused lines
Proceedings of the 23rd international conference on Supercomputing
Stream chaining: exploiting multiple levels of correlation in data prefetching
Proceedings of the 36th annual international symposium on Computer architecture
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
Proceedings of the 36th annual international symposium on Computer architecture
Architecture Design for Soft Errors
Architecture Design for Soft Errors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Timing local streams: improving timeliness in data prefetching
Proceedings of the 24th ACM International Conference on Supercomputing
High performance cache replacement using re-reference interval prediction (RRIP)
Proceedings of the 37th annual international symposium on Computer architecture
Using dead blocks as a virtual victim cache
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Coterminous locality and coterminous group data prefetching on chip-multiprocessors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Sampling Dead Block Prediction for Last-Level Caches
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Predicting cost amortization for query services
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
On the theory and potential of LRU-MRU collaborative cache management
Proceedings of the international symposium on Memory management
Reducing Network-on-Chip energy consumption through spatial locality speculation
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Bypass and insertion algorithms for exclusive last-level caches
Proceedings of the 38th annual international symposium on Computer architecture
Dynamic access distance driven cache replacement
ACM Transactions on Architecture and Code Optimization (TACO)
Using runtime activity to dynamically filter out inefficient data prefetches
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Global-aware and multi-order context-based prefetching for high-performance processors
International Journal of High Performance Computing Applications
Do trace cache, value prediction and prefetching improve SMT throughput?
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
When Prefetching Works, When It Doesn’t, and Why
ACM Transactions on Architecture and Code Optimization (TACO)
SHiP: signature-based hit predictor for high performance caching
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
A generalized theory of collaborative caching
Proceedings of the 2012 international symposium on Memory Management
Improving writeback efficiency with decoupled last-write prediction
Proceedings of the 39th Annual International Symposium on Computer Architecture
Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Optimal bypass monitor for high performance last-level caches
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Exploiting reuse locality on inclusive shared last-level caches
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Proceedings of the 40th Annual International Symposium on Computer Architecture
Managing shared last-level cache in a heterogeneous multicore processor
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
RDIP: return-address-stack directed instruction prefetching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Insertion and promotion for tree-based PseudoLRU last-level caches
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
The reuse cache: downsizing the shared last-level cache
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient management of last-level caches in graphics processors for 3D scene rendering workloads
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Virtually split cache: An efficient mechanism to distribute instructions and data
ACM Transactions on Architecture and Code Optimization (TACO)
Temporal-based multilevel correlating inclusive cache replacement
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.01 |
Effective data prefetching requires accurate mechanisms to predict both “which” cache blocks to prefetch and “when” to prefetch them. This paper proposes the Dead-Block Predictors (DBPs), trace-based predictors that accurately identify “when” an Ll data cache block becomes evictable or “dead”. Predicting a dead block significantly enhances prefetching lookahead and opportunity, and enables placing data directly into Ll, obviating the need for auxiliary prefetch buffers. This paper also proposes Dead-Block Correlating Prefetchers (DBCPs), that use address correlation to predict “which” subsequent block to prefetch when a block becomes evictable. A DBCP enables effective data prefetching in a wide spectrum of pointer-intensive, integer, and floating-point applications.We use cycle-accurate simulation of an out-of-order superscalar processor and memory-intensive benchmarks to show that: (1) dead-block prediction enhances prefetching lookahead at least by an order of magnitude as compared to previous techniques, (2) a DBP can predict dead blocks on average with a coverage of 90% only mispredicting 4% of the time, (3) a DBCP offers an address prediction coverage of 86% only mispredicting 3% of the time, and (4) DBCPs improve performance by 62% on average and 282% at best in the benchmarks we studied.