Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Cache miss heuristics and preloading techniques for general-purpose programs
Proceedings of the 28th annual international symposium on Microarchitecture
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Correlation-based hardware prefetching
Correlation-based hardware prefetching
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
25 years of the international symposia on Computer architecture (selected papers)
Dependence based prefetching for linked data structures
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Push vs. pull: data movement for linked data structures
Proceedings of the 14th international conference on Supercomputing
Pointer cache assisted prefetching
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Enhancing memory level parallelism via recovery-free value prediction
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Guided region prefetching: a cooperative hardware/software approach
Proceedings of the 30th annual international symposium on Computer architecture
Energy efficient D-TLB and data cache using semantic-aware multilateral partitioning
Proceedings of the 2003 international symposium on Low power electronics and design
MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
When prefetching improves/degrades performance
Proceedings of the 2nd conference on Computing frontiers
Exploring the limits of prefetching
IBM Journal of Research and Development - Electrochemical technology in microelectronics
Enhancing Memory-Level Parallelism via Recovery-Free Value Prediction
IEEE Transactions on Computers
PARE: a power-aware hardware data prefetching engine
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
On the performance of trace locality of reference
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Spectral prefetcher: An effective mechanism for L2 cache prefetching
ACM Transactions on Architecture and Code Optimization (TACO)
Prefetching-aware cache line turnoff for saving leakage energy
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Decomposing memory performance: data structures and phases
Proceedings of the 5th international symposium on Memory management
Efficient emulation of hardware prefetchers via event-driven helper threading
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Overlapping dependent loads with addressless preload
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Data prefetching in a cache hierarchy with high bandwidth and capacity
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
IEEE Transactions on Computers
Future execution: A prefetching mechanism that uses multiple cores to speed up single threads
ACM Transactions on Architecture and Code Optimization (TACO)
Fairness and Throughput in Switch on Event Multithreading
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Ubiquitous memory introspection
Proceedings of the International Symposium on Code Generation and Optimization
Fairness enforcement in switch on event multithreading
ACM Transactions on Architecture and Code Optimization (TACO)
Data prefetching in a cache hierarchy with high bandwidth and capacity
ACM SIGARCH Computer Architecture News
FastForward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Low-Cost Adaptive Data Prefetching
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
A compiler-directed data prefetching scheme for chip multiprocessors
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
PFetch: software prefetching exploiting temporal predictability of memory access streams
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Coordinated control of multiple prefetchers in multi-core systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
COMPASS: a programmable data prefetcher using idle GPU shaders
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Data marshaling for multi-core architectures
Proceedings of the 37th annual international symposium on Computer architecture
Energy-efficient hardware data prefetching
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Sprint: speculative prefetching of remote data
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
When Prefetching Works, When It Doesn’t, and Why
ACM Transactions on Architecture and Code Optimization (TACO)
Energy-aware data prefetching for general-purpose programs
PACS'04 Proceedings of the 4th international conference on Power-Aware Computer Systems
Pointy: a hybrid pointer prefetcher for managed runtime systems
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Linearizing irregular memory accesses for improved correlated prefetching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.02 |
Although central processor speeds continues to improve, improvements in overall system performance are increasingly hampered by memory latency, especially for pointer-intensive applications. To counter this loss of performance, numerous data and instruction prefetch mechanisms have been proposed. Recently, several proposals have posited a memory-side prefetcher; typically, these prefetchers involve a distinct processor that executes a program slice that would effectively prefetch data needed by the primary program. Alternative designs embody large state tables that learn the miss reference behavior of the processor and attempt to prefetch likely misses.This paper proposes Content-Directed Data Prefetching, a data prefetching architecture that exploits the memory allocation used by operating systems and runtime systems to improve the performance of pointer-intensive applications constructed using modern language systems. This technique is modeled after conservative garbage collection, and prefetches "likely" virtual addresses observed in memory references. This prefetching mechanism uses the underlying data of the application, and provides an 11.3% speedup using no additional processor state. By adding less than ½% space overhead to the second level cache, performance can be further increased to 12.6% across a range of "real world" applications.