An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Characterization of alpha AXP performance using TP and SPEC workloads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Complexity/performance tradeoffs with non-blocking loads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Contrasting characteristics and cache performance of technical and multi-user commercial workloads
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A modified approach to data cache management
Proceedings of the 28th annual international symposium on Microarchitecture
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Cache miss heuristics and preloading techniques for general-purpose programs
Proceedings of the 28th annual international symposium on Microarchitecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Distributed Prefetch-buffer/Cache Design for High Performance Memory Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
Prediction caches for superscalar processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The predictability of data values
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Characterization and improvement of load/store cache-based prefetching
ICS '98 Proceedings of the 12th international conference on Supercomputing
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Dependence based prefetching for linked data structures
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Capturing dynamic memory reference behavior with adaptive cache topology
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Automatic Compiler-Inserted Prefetching for Pointer-Based Applications
IEEE Transactions on Computers - Special issue on cache memory and related problems
Memory sharing predictor: the key to a speculative coherent DSM
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Cyclic dependence based data reference prediction
ICS '99 Proceedings of the 13th international conference on Supercomputing
Fetch directed instruction prefetching
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Procedure placement using temporal-ordering information
ACM Transactions on Programming Languages and Systems (TOPLAS)
Limits of Data Value Predictability
International Journal of Parallel Programming
Push vs. pull: data movement for linked data structures
Proceedings of the 14th international conference on Supercomputing
Proceedings of the 27th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Predictor-directed stream buffers
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Architectural and compiler support for effective instruction prefetching: a cooperative approach
ACM Transactions on Computer Systems (TOCS)
ICS '01 Proceedings of the 15th international conference on Supercomputing
A novel renaming mechanism that boosts software prefetching
ICS '01 Proceedings of the 15th international conference on Supercomputing
Speculative precomputation: long-range prefetching of delinquent loads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
FPGA '02 Proceedings of the 2002 ACM/SIGDA tenth international symposium on Field-programmable gate arrays
Dynamic hot data stream prefetching for general-purpose programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Profile-guided post-link stride prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
Using a user-level memory thread for correlation prefetching
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Timekeeping in the memory system: predicting and optimizing memory behavior
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Exploiting speculative value reuse using value prediction
CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Dynamic speculative precomputation
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A stateless, content-directed data prefetching mechanism
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A Decoupled Predictor-Directed Stream Prefetching Architecture
IEEE Transactions on Computers
A Programmable Memory Hierarchy for Prefetching Linked Data Structures
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Content-Based Prefetching: Initial Results
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Pointer cache assisted prefetching
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
TCP: Tag Correlating Prefetchers
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
DRAM-Page Based Prediction and Prefetching
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Proceedings of the 30th annual international symposium on Computer architecture
Guided region prefetching: a cooperative hardware/software approach
Proceedings of the 30th annual international symposium on Computer architecture
Correlation Prefetching with a User-Level Memory Thread
IEEE Transactions on Parallel and Distributed Systems
Call graph prefetching for database applications
ACM Transactions on Computer Systems (TOCS)
The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
IEEE Transactions on Computers
Receiving message prediction method
Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
A first glance at Kilo-instruction based multiprocessors
Proceedings of the 1st conference on Computing frontiers
ACM Transactions on Computer Systems (TOCS)
Cluster miss prediction with prefetch on miss for embedded CPU instruction caches
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
MicroLib: A Case for the Quantitative Comparison of Micro-Architecture Mechanisms
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Toward kilo-instruction processors
ACM Transactions on Architecture and Code Optimization (TACO)
Tolerating memory latency through push prefetching for pointer-intensive applications
ACM Transactions on Architecture and Code Optimization (TACO)
Effective Instruction Prefetching via Fetch Prestaging
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Memory predecryption: hiding the latency overhead of memory encryption
ACM SIGARCH Computer Architecture News - Special issue: Workshop on architectural support for security and anti-virus (WASSA)
Exploring the limits of prefetching
IBM Journal of Research and Development - Electrochemical technology in microelectronics
Memory-side prefetching for linked data structures for processor-in-memory systems
Journal of Parallel and Distributed Computing
Temporal Streaming of Shared Memory
Proceedings of the 32nd annual international symposium on Computer Architecture
Whole execution traces and their applications
ACM Transactions on Architecture and Code Optimization (TACO)
Store-Ordered Streaming of Shared Memory
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Speculative execution for hiding memory latency
MEDEA '04 Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture
Kilo-instruction processors, runahead and prefetching
Proceedings of the 3rd conference on Computing frontiers
CAVA: Using checkpoint-assisted value prediction to hide L2 misses
ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic feature selection for hardware prediction
Journal of Systems Architecture: the EUROMICRO Journal
International Journal of Parallel Programming
Efficient emulation of hardware prefetchers via event-driven helper threading
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Overlapping dependent loads with addressless preload
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Future execution: A prefetching mechanism that uses multiple cores to speed up single threads
ACM Transactions on Architecture and Code Optimization (TACO)
Reducing Cache Pollution via Dynamic Data Prefetch Filtering
IEEE Transactions on Computers
Memory Prefetching Using Adaptive Stream Detection
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Accelerating sequential programs on Chip Multiprocessors via Dynamic Prefetching Thread
Microprocessors & Microsystems
Analysis of hardware prefetching across virtual page boundaries
Proceedings of the 4th international conference on Computing frontiers
WCET analysis of instruction caches with prefetching
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A low power front-end for embedded processors using a block-aware instruction set
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Data access history cache and associated data prefetching mechanisms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Proceedings of the 5th conference on Computing frontiers
Server-based data push architecture for multi-processor environments
Journal of Computer Science and Technology
Hiding cache miss penalty using priority-based execution for embedded processors
Proceedings of the conference on Design, automation and test in Europe
Analyzing the worst-case execution time for instruction caches with prefetching
ACM Transactions on Embedded Computing Systems (TECS)
Communication Based Proactive Link Power Management
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
PFetch: software prefetching exploiting temporal predictability of memory access streams
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Prefetch-Aware DRAM Controllers
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Spatio-temporal memory streaming
Proceedings of the 36th annual international symposium on Computer architecture
Stream chaining: exploiting multiple levels of correlation in data prefetching
Proceedings of the 36th annual international symposium on Computer architecture
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Machine learning-based prefetch optimization for data center applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Coordinated control of multiple prefetchers in multi-core systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Symbolic prefetching in transactional distributed shared memory
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
COMPASS: a programmable data prefetcher using idle GPU shaders
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Inter-core cooperative TLB for chip multiprocessors
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
ACM Transactions on Architecture and Code Optimization (TACO)
The bandwidth expansion effectiveness of cache levels block prefetch
ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Extending data prefetching to cope with context switch misses
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Timing local streams: improving timeliness in data prefetching
Proceedings of the 24th ACM International Conference on Supercomputing
Data marshaling for multi-core architectures
Proceedings of the 37th annual international symposium on Computer architecture
An Adaptive Data Prefetcher for High-Performance Processors
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Engineering scalable, cache and space efficient tries for strings
The VLDB Journal — The International Journal on Very Large Data Bases
Coterminous locality and coterminous group data prefetching on chip-multiprocessors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Redesigning the string hash table, burst trie, and BST to exploit cache
Journal of Experimental Algorithmics (JEA)
i-NVMM: a secure non-volatile main memory system with incremental encryption
Proceedings of the 38th annual international symposium on Computer architecture
Automatically generating symbolic prefetches for distributed transactional memories
Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
Global-aware and multi-order context-based prefetching for high-performance processors
International Journal of High Performance Computing Applications
The migration prefetcher: Anticipating data promotion in dynamic NUCA caches
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Do trace cache, value prediction and prefetching improve SMT throughput?
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
When Prefetching Works, When It Doesn’t, and Why
ACM Transactions on Architecture and Code Optimization (TACO)
Application-Specific hardware-driven prefetching to improve data cache performance
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Microvisor: a runtime architecture for thermal management in chip multiprocessors
Transactions on High-Performance Embedded Architectures and Compilers IV
Communication based proactive link power management
Transactions on High-Performance Embedded Architectures and Compilers IV
Improving dynamic prediction accuracy through multi-level phase analysis
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Boosting mobile GPU performance with a decoupled access/execute fragment processor
Proceedings of the 39th Annual International Symposium on Computer Architecture
To satisfy impatient web surfers is hard
FUN'12 Proceedings of the 6th international conference on Fun with Algorithms
Pointy: a hybrid pointer prefetcher for managed runtime systems
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Making data prefetch smarter: adaptive prefetching on POWER7
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Application data prefetching on the IBM blue gene/Q supercomputer
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
ACM Transactions on Architecture and Code Optimization (TACO)
Orchestrated scheduling and prefetching for GPGPUs
Proceedings of the 40th Annual International Symposium on Computer Architecture
S/DC: a storage and energy efficient data prefetcher
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Meeting midway: improving CMP performance with memory-side prefetching
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Network-aware data caching and prefetching for cloud-hosted metadata retrieval
NDM '13 Proceedings of the Third International Workshop on Network-Aware Data Management
Linearizing irregular memory accesses for improved correlated prefetching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
The case of using multiple streams in streaming
International Journal of Automation and Computing
Hi-index | 0.01 |
Prefetching is one approach to reducing the latency of memory operations in modern computer systems. In this paper, we describe the Markov prefetcher. This prefetcher acts as an interface between the on-chip and off-chip cache, and can be added to existing computer designs. The Markov prefetcher is distinguished by prefetching multiple reference predictions from the memory subsystem, and then prioritizing the delivery of those references to the processor.This design results in a prefetching system that provides good coverage, is accurate and produces timely results that can be effectively used by the processor. In our cycle-level simulations, the Markov Prefetcher reduces the overall execution stalls due to instruction and data memory operations by an average of 54% for various commercial benchmarks while only using two thirds the memory of a demand-fetch cache organization.