ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Data prefetching in multiprocessor vector cache memories
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Data access microarchitectures for superscalar processors with compiler-assisted data prefetching
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Prefetch unit for vector operations on scalar computers (abstract)
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Stride directed prefetching in scalar processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons
ACM Computing Surveys (CSUR)
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A performance study of software and hardware data prefetching schemes
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Compiler techniques for data prefetching on the PowerPC
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
An effective programmable prefetch engine for on-chip caches
Proceedings of the 28th annual international symposium on Microarchitecture
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler-directed data prefetching in multiprocessors with memory hierarchies
ICS '90 Proceedings of the 4th international conference on Supercomputing
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
Data prefetching on the HP PA-8000
Proceedings of the 24th annual international symposium on Computer architecture
Dependence based prefetching for linked data structures
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
The performance impact of block sizes and fetch strategies
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Computing Surveys (CSUR)
Exposing I/O concurrency with informed prefetching
PDIS '94 Proceedings of the third international conference on on Parallel and distributed information systems
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Limited Bandwidth to Affect Processor Design
IEEE Micro
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Branch-Directed and Stride-Based Data Cache Prefetching
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Distributed Prefetch-buffer/Cache Design for High Performance Memory Systems
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A Compiler-Assisted Data Prefetch Controller
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Filtering algorithms and implementation for very fast publish/subscribe systems
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Dynamic hot data stream prefetching for general-purpose programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Conjunctive selection conditions in main memory
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Going the distance for TLB prefetching: an application-driven study
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Simple and effective array prefetching in Java
JGI '02 Proceedings of the 2002 joint ACM-ISCOPE conference on Java Grande
Increasing hardware data prefetching performance using the second-level cache
Journal of Systems Architecture: the EUROMICRO Journal
An Agent Enhanced Framework to Support Pre-dispatching of Tasks in Workflow Management Systems
EDCIS '02 Proceedings of the First International Conference on Engineering and Deployment of Cooperative Information Systems
Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A new hybrid approach to exploit localities: LRFU with adaptive prefetching
ACM SIGMETRICS Performance Evaluation Review
Selection conditions in main memory
ACM Transactions on Database Systems (TODS)
Static Identification of Delinquent Loads
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Prefetch injection based on hardware monitoring and object metadata
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
The Impact of Incorrectly Speculated Memory Operations in a Multithreaded Architecture
IEEE Transactions on Parallel and Distributed Systems
Addressing mode driven low power data caches for embedded processors
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
High Efficiency Counter Mode Security Architecture via Prediction and Precomputation
Proceedings of the 32nd annual international symposium on Computer Architecture
Generating cache hints for improved program efficiency
Journal of Systems Architecture: the EUROMICRO Journal
On the performance of trace locality of reference
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Aggregating processor free time for energy reduction
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Design and Implementation of a Compiler Framework for Helper Threading on Multi-core Processors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Prefetching-aware cache line turnoff for saving leakage energy
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Studying interactions between prefetching and cache line turnoff
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Improving locality with parallel hierarchical copying GC
Proceedings of the 5th international symposium on Memory management
On the Design and Implementation of an Effective Prefetch Strategy for DSM Systems
The Journal of Supercomputing
Overlapping dependent loads with addressless preload
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A PAB-based multi-prefetcher mechanism
International Journal of Parallel Programming
Data prefetching in a cache hierarchy with high bandwidth and capacity
MEDEA '06 Proceedings of the 2006 workshop on MEmory performance: DEaling with Applications, systems and architectures
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Hybrid multi-core architecture for boosting single-threaded performance
ACM SIGARCH Computer Architecture News
Accelerating memory decryption and authentication with frequent value prediction
Proceedings of the 4th international conference on Computing frontiers
SARC: sequential prefetching in adaptive replacement cache
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Comparing memory systems for chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Matrix-Stripe-Cache-Based Contiguity Transform for Fragmented Writes in RAID-5
IEEE Transactions on Computers
Performance driven data cache prefetching in a dynamic software optimization system
Proceedings of the 21st annual international conference on Supercomputing
Forma: A framework for safe automatic array reshaping
ACM Transactions on Programming Languages and Systems (TOPLAS)
Data prefetching in a cache hierarchy with high bandwidth and capacity
ACM SIGARCH Computer Architecture News
Prefetching irregular references for software cache on cell
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
An Extended R-Tree Indexing Method Using Selective Prefetching in Main Memory
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
Capturing and optimizing the interactions between prefetching and cache line turnoff
Microprocessors & Microsystems
Static analysis of processor stall cycle aggregation
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Comparative evaluation of memory models for chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
A compiler-directed data prefetching scheme for chip multiprocessors
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Runtime engine for dynamic profile guided stride prefetching
Journal of Computer Science and Technology
Dictionary-based order-preserving string compression for main memory column stores
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Multiprocessor System-on-Chip designs with active memory processors for higher memory efficiency
Proceedings of the 46th Annual Design Automation Conference
Blue Gene/L compute chip: memory and Ethernet subsystem
IBM Journal of Research and Development
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Using prefetching to improve reference-counting garbage collectors
CC'07 Proceedings of the 16th international conference on Compiler construction
Cashing in on hints for better prefetching and caching in PVFS and MPI-IO
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Fast and compact hash tables for integer keys
ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
Adaptive prefetching for shared cache based chip multiprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
Coterminous locality and coterminous group data prefetching on chip-multiprocessors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Redesigning the string hash table, burst trie, and BST to exploit cache
Journal of Experimental Algorithmics (JEA)
C-pack: a high-performance microprocessor cache compression algorithm
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Constructing application-specific memory hierarchies on FPGAs
Transactions on high-performance embedded architectures and compilers III
Resolving a L2-prefetch-caused parallel nonscaling on Intel Core microarchitecture
Journal of Parallel and Distributed Computing
A reuse-aware prefetching scheme for scratchpad memory
Proceedings of the 48th Design Automation Conference
Bandwidth constrained coordinated HW/SW prefetching for multicores
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Virtual I/O caching: dynamic storage cache management for concurrent workloads
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A framework for compiler driven design space exploration for embedded system customization
ASIAN'04 Proceedings of the 9th Asian Computing Science conference on Advances in Computer Science: dedicated to Jean-Louis Lassez on the Occasion of His 5th Cycle Birthday
When Prefetching Works, When It Doesn’t, and Why
ACM Transactions on Architecture and Code Optimization (TACO)
PSnAP: accurate synthetic address streams through memory profiles
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Application-Specific hardware-driven prefetching to improve data cache performance
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
SAD prefetching for MPEG4 using flux caches
SAMOS'06 Proceedings of the 6th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
An efficient hardware architecture from c program with memory access to hardware
ICCSA'10 Proceedings of the 2010 international conference on Computational Science and Its Applications - Volume Part II
PICA: Processor Idle Cycle Aggregation for Energy-Efficient Embedded Systems
ACM Transactions on Embedded Computing Systems (TECS)
Design of the IBM Blue Gene/Q compute chip
IBM Journal of Research and Development
Portable, flexible, and scalable soft vector processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
S/DC: a storage and energy efficient data prefetcher
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Multiverse: efficiently supporting distributed high-level speculation
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Toward application-specific memory reconfiguration for energy efficiency
E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing
Hi-index | 0.00 |
The expanding gap between microprocessor and DRAM performance has necessitated the use of increasingly aggressive techniques designed to reduce or hide the latency of main memory access. Although large cache hierarchies have proven to be effective in reducing this latency for the most frequently used data, it is still not uncommon for many programs to spend more than half their run times stalled on memory requests. Data prefetching has been proposed as a technique for hiding the access latency of data referencing patterns that defeat caching strategies. Rather than waiting for a cache miss to initiate a memory fetch, data prefetching anticipates such misses and issues a fetch to the memory system in advance of the actual memory reference. To be effective, prefetching must be implemented in such a way that prefetches are timely, useful, and introduce little overhead. Secondary effects such as cache pollution and increased memory bandwidth requirements must also be taken into consideration. Despite these obstacles, prefetching has the potential to significantly improve overall program execution time by overlapping computation with memory accesses. Prefetching strategies are diverse, and no single strategy has yet been proposed that provides optimal performance. The following survey examines several alternative approaches, and discusses the design tradeoffs involved when implementing a data prefetch strategy.