An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Prefetching using Markov predictors
Proceedings of the 24th annual international symposium on Computer architecture
Using generational garbage collection to implement cache-conscious data placement
Proceedings of the 1st international symposium on Memory management
Dependence based prefetching for linked data structures
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Effective jump-pointer prefetching for linked data structures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Practicing JUDO: Java under dynamic optimizations
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Adaptive optimization in the Jalapeño JVM
OOPSLA '00 Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Automated data-member layout of heap objects to improve memory-hierarchy performance
ACM Transactions on Programming Languages and Systems (TOPLAS)
ACM Computing Surveys (CSUR)
Proceedings of the 2nd international symposium on Memory management
A framework for reducing the cost of instrumented code
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Efficient representations and abstractions for quantifying and exploiting data reference locality
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Data prefetching by dependence graph precomputation
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Automated design of finite state machine predictors for customized processors
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
An efficient profile-analysis framework for data-layout optimizations
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
On the Stability of Temporal Data Reference Profiles
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Data Flow Analysis for Software Prefetching Linked Data Structures in Java
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Linear-Time, Incremental Hierarchy Inference for Compression
DCC '97 Proceedings of the Conference on Data Compression
Improving the Effectiveness of Software Prefetching with Adaptive Execution
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
The java hotspotTM server compiler
JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
Online feedback-directed optimization of Java
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Data remapping for design space optimization of embedded memory systems
ACM Transactions on Embedded Computing Systems (TECS)
Pointer cache assisted prefetching
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Stride prefetching by dynamically inspecting objects
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Exposing Memory Access Regularities Using Object-Relative Memory Profiling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Prefetch injection based on hardware monitoring and object metadata
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Low-overhead memory leak detection using adaptive statistical profiling
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Journal of Experimental Algorithmics (JEA)
Memory Profiling using Hardware Counters
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Fast data-locality profiling of native execution
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Temporal Streaming of Shared Memory
Proceedings of the 32nd annual international symposium on Computer Architecture
Whole execution traces and their applications
ACM Transactions on Architecture and Code Optimization (TACO)
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Store-Ordered Streaming of Shared Memory
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Dynamic Helper Threaded Prefetching on the Sun UltraSPARC CMP Processor
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
A Self-Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework
Proceedings of the International Symposium on Code Generation and Optimization
Online Phase Detection Algorithms
Proceedings of the International Symposium on Code Generation and Optimization
Profiling over Adaptive Ranges
Proceedings of the International Symposium on Code Generation and Optimization
Dynamic memory optimization using pool allocation and prefetching
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
ALITER: an asynchronous lightweight instrumentation tool for event recording
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Improving locality with parallel hierarchical copying GC
Proceedings of the 5th international symposium on Memory management
Decomposing memory performance: data structures and phases
Proceedings of the 5th international symposium on Memory management
Online performance auditing: using hot optimizations without getting burned
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
IEEE Transactions on Computers
Locality approximation using time
Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Software behavior oriented parallelization
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Shadow Profiling: Hiding Instrumentation Costs with Parallelism
Proceedings of the International Symposium on Code Generation and Optimization
Performance driven data cache prefetching in a dynamic software optimization system
Proceedings of the 21st annual international conference on Supercomputing
Phase-based adaptive recompilation in a JVM
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Formulating and implementing profiling over adaptive ranges
ACM Transactions on Architecture and Code Optimization (TACO)
Sampling-based program locality approximation
Proceedings of the 7th international symposium on Memory management
Automatic Prefetching with Binary Code Rewriting in Object-Based DSMs
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
QVM: an efficient runtime for detecting defects in deployed systems
Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applications
PFetch: software prefetching exploiting temporal predictability of memory access streams
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Placement optimization using data context collected during garbage collection
Proceedings of the 2009 international symposium on Memory management
Fast Track: A Software System for Speculative Program Optimization
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Scenario Based Optimization: A Framework for Statically Enabling Online Optimizations
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Spatio-temporal memory streaming
Proceedings of the 36th annual international symposium on Computer architecture
Tree-traversal orientation analysis
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
A graph theoretic approach to cache-conscious placement of data for direct mapped caches
Proceedings of the 2010 international symposium on Memory management
Coterminous locality and coterminous group data prefetching on chip-multiprocessors
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Inter-core prefetching for multicore processors using migrating helper threads
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs?
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Analysis and performance results of computing betweenness centrality on IBM Cyclops64
The Journal of Supercomputing
Energy-efficient hardware data prefetching
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
QVM: An Efficient Runtime for Detecting Defects in Deployed Systems
ACM Transactions on Software Engineering and Methodology (TOSEM)
smt-SPRINTS: software precomputation with intelligent streaming for resource-constrained SMTs
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
VMAD: an advanced dynamic program analysis and instrumentation framework
CC'12 Proceedings of the 21st international conference on Compiler Construction
Exploiting inter-sequence correlations for program behavior prediction
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Application data prefetching on the IBM blue gene/Q supercomputer
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Esodyp+: prefetching in the Jackal software DSM
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
HOTL: a higher order theory of locality
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
A coldness metric for cache optimization
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Hi-index | 0.00 |
Prefetching data ahead of use has the potential to tolerate the grow ing processor-memory performance gap by overlapping long latency memory accesses with useful computation. While sophisti cated prefetching techniques have been automated for limited domains, such as scientific codes that access dense arrays in loop nests, a similar level of success has eluded general-purpose pro grams, especially pointer-chasing codes written in languages such as C and C++. We address this problem by describing, implementing and evaluating a dynamic prefetching scheme. Our technique runs on stock hardware, is completely automatic, and works for general-purpose programs, including pointer-chasing codes written in weakly-typed languages, such as C and C++. It operates in three phases. First, the profiling phase gathers a temporal data reference profile from a running program with low-overhead. Next, the profiling is turned off and a fast analysis algorithm extracts hot data streams, which are data reference sequences that frequently repeat in the same order, from the temporal profile. Then, the system dynamically injects code at appropriate program points to detect and prefetch these hot data streams. Finally, the process enters the hibernation phase where no profiling or analysis is performed, and the program continues to execute with the added prefetch instructions. At the end of the hibernation phase, the program is de-optimized to remove the inserted checks and prefetch instructions, and control returns to the profiling phase. For long-running programs, this profile, analyze and optimize, hibernate, cycle will repeat multiple times. Our initial results from applying dynamic prefetching are promising, indicating overall execution time improvements of 5.19% for several memory-performance-limited SPECint2000 benchmarks running their largest (ref) inputs.