An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Characterization of alpha AXP performance using TP and SPEC workloads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Complexity/performance tradeoffs with non-blocking loads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Contrasting characteristics and cache performance of technical and multi-user commercial workloads
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A modified approach to data cache management
Proceedings of the 28th annual international symposium on Microarchitecture
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Cache miss heuristics and preloading techniques for general-purpose programs
Proceedings of the 28th annual international symposium on Microarchitecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Distributed Prefetch-buffer/Cache Design for High Performance Memory Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Data prefetch mechanisms for accelerating symbolic and numeric computation
Data prefetch mechanisms for accelerating symbolic and numeric computation
Hardware identification of cache conflict misses
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Dead-block prediction & dead-block correlating prefetchers
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Runtime identification of cache conflict misses: The adaptive miss buffer
ACM Transactions on Computer Systems (TOCS)
Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Execution history guided instruction prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
Going the distance for TLB prefetching: an application-driven study
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
DSTRIDE: data-cache miss-address-based stride prefetching scheme for multimedia processors
ACSAC '01 Proceedings of the 6th Australasian conference on Computer systems architecture
Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Improving cache hit ratio by extended referencing cache lines
Journal of Computing Sciences in Colleges
Enhancing memory level parallelism via recovery-free value prediction
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Detecting global stride locality in value streams
Proceedings of the 30th annual international symposium on Computer architecture
Execution History Guided Instruction Prefetching
The Journal of Supercomputing
Reducing disk I/O times using anticipatory movements of the disk head
Journal of Systems Architecture: the EUROMICRO Journal
Effective stream-based and execution-based data prefetching
Proceedings of the 18th annual international conference on Supercomputing
Compiler orchestrated prefetching via speculation and predication
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Exploiting Barriers to Optimize Power Consumption of CMPs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Addressing mode driven low power data caches for embedded processors
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Automatic measurement of memory hierarchy parameters
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Enhancing Memory-Level Parallelism via Recovery-Free Value Prediction
IEEE Transactions on Computers
On the performance of trace locality of reference
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Spectral prefetcher: An effective mechanism for L2 cache prefetching
ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic memory optimization using pool allocation and prefetching
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Data prefetching in a cache hierarchy with high bandwidth and capacity
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Proceedings of the 2006 workshop on Memory system performance and correctness
Optimal multistream sequential prefetching in a shared cache
ACM Transactions on Storage (TOS)
Path: page access tracking to improve memory management
Proceedings of the 6th international symposium on Memory management
Data prefetching in a cache hierarchy with high bandwidth and capacity
ACM SIGARCH Computer Architecture News
Focused prefetching: performance oriented prefetching based on commit stalls
Proceedings of the 22nd annual international conference on Supercomputing
Prefetching with adaptive cache culling for striped disk arrays
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Guided Prefetching Based on Runtime Access Patterns
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
Low-Cost Adaptive Data Prefetching
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Automatic Prefetching with Binary Code Rewriting in Object-Based DSMs
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Capturing and optimizing the interactions between prefetching and cache line turnoff
Microprocessors & Microsystems
Markov Model Based Disk Power Management for Data Intensive Workloads
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Efficient Data Access Management for FPGA-Based Image Processing SoCs
RSP '09 Proceedings of the 2009 IEEE/IFIP International Symposium on Rapid System Prototyping
Extended histories: improving regularity and performance in correlation prefetchers
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Cost Minimization with HPDFG and Data Mining for Heterogeneous DSP
Journal of Signal Processing Systems
Esodyp+: prefetching in the Jackal software DSM
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Reducing Power and Energy Overhead in Instruction Prefetching for Embedded Processor Systems
International Journal of Handheld Computing Research
OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Prefetching is one approach to reducing the latency of memory operations in modern computer systems. In this paper, we describe the Markov prefetcher. This prefetcher acts as an interface between the on-chip and off-chip cache and can be added to existing computer designs. The Markov prefetcher is distinguished by prefetching multiple reference predictions from the memory subsystem, and then prioritizing the delivery of those references to the processor. This design results in a prefetching system that provides good coverage, is accurate, and produces timely results that can be effectively used by the processor. We also explored a range of techniques that can be used to reduce the bandwidth demands of prefetching, leading to improved memory system performance. In our cycle-level simulations, the Markov Prefetcher reduces the overall execution stalls due to instruction and data memory operations by an average of 54 percent for various commercial benchmarks while only using two-thirds the memory of a demand-fetch cache organization.