ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Using profile information to assist classic code optimizations
Software—Practice & Experience
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Predicting conditional branch directions from previous runs of a program
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Optimally profiling and tracing programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Compiler-based prefetching for recursive data structures
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Data prefetching on the HP PA-8000
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Dependence based prefetching for linked data structures
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Effective jump-pointer prefetching for linked data structures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Predictor-directed stream buffers
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
On the importance of points-to analysis and other memory disambiguation methods for C programs
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Execution-based prediction using speculative slices
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Speculative precomputation: long-range prefetching of delinquent loads
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Code Specialization Based on Value Profiles
SAS '00 Proceedings of the 7th International Symposium on Static Analysis
Value-Profile Guided Stride Prefetching for Irregular Code
CC '02 Proceedings of the 11th International Conference on Compiler Construction
Memory performance analysis of SPEC2000C for the Intel(R) Itanium/sup TM/ processor
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Data remapping for design space optimization of embedded memory systems
ACM Transactions on Embedded Computing Systems (TECS)
Compiler managed micro-cache bypassing for high performance EPIC processors
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Stride prefetching by dynamically inspecting objects
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Enhancing memory level parallelism via recovery-free value prediction
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Guided region prefetching: a cooperative hardware/software approach
Proceedings of the 30th annual international symposium on Computer architecture
The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Ispike: A Post-link Optimizer for the Intel®Itanium®Architecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Exposing Memory Access Regularities Using Object-Relative Memory Profiling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Compiler orchestrated prefetching via speculation and predication
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Enhancing Memory-Level Parallelism via Recovery-Free Value Prediction
IEEE Transactions on Computers
On the performance of trace locality of reference
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Design and Implementation of a Compiler Framework for Helper Threading on Multi-core Processors
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A Self-Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework
Proceedings of the International Symposium on Code Generation and Optimization
Dynamic memory optimization using pool allocation and prefetching
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Improving locality with parallel hierarchical copying GC
Proceedings of the 5th international symposium on Memory management
IEEE Transactions on Computers
Compiler driven data layout optimization for regular/irregular array access patterns
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Distilling the essence of proprietary workloads into miniature benchmarks
ACM Transactions on Architecture and Code Optimization (TACO)
Languages and Compilers for Parallel Computing
PFetch: software prefetching exploiting temporal predictability of memory access streams
Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Runtime engine for dynamic profile guided stride prefetching
Journal of Computer Science and Technology
Analysis and performance results of computing betweenness centrality on IBM Cyclops64
The Journal of Supercomputing
Energy-efficient hardware data prefetching
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Data-Layout optimization using reuse distance distribution
EUC'06 Proceedings of the 2006 international conference on Emerging Directions in Embedded and Ubiquitous Computing
When Prefetching Works, When It Doesn’t, and Why
ACM Transactions on Architecture and Code Optimization (TACO)
An overview of the open research compiler
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Pointy: a hybrid pointer prefetcher for managed runtime systems
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Meeting midway: improving CMP performance with memory-side prefetching
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Simple profile rectifications go a long way
ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Hi-index | 0.01 |
Irregular data references are difficult to prefetch, as the future memory address of a load instruction is hard to anticipate by a compiler. However, recent studies as well as our experience indicate that some important load instructions in irregular programs contain stride access patterns. Although the load instructions with stride patterns are difficult to identify with static compiler techniques, we developed an efficient profiling method to discover these load instructions. The new profiling method integrates the profiling for stride information and the traditional profiling for edge frequency into a single profiling pass. The integrated profiling pass runs only 17% slower than the frequency profiling alone. The collected stride information helps the compiler to identify load instructions with stride patterns that can be prefetched efficiently and beneficially. We implemented the new profiling and prefetching techniques in a research compiler for Itanium Processor Family (IPF), and obtained significant performance improvement for the SPECINT2000 programs running on Itanium machines. For example, we achieved a 1.59x speedup for 181.mcf, 1.14x for 254.gap, and 1.08x for 197.parser. We also showed that the performance gain is stable across input data sets. These benefits make the new profiling and prefetching techniques suitable for production compilers.