Efficient representations and abstractions for quantifying and exploiting data reference locality

Authors:
Trishul M. Chilimbi
Affiliations:
Microsoft Research, One Microsoft Way, Redmond, WA
Venue:
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Year:
2001

Citing 19
Cited 54

Abstract execution: a technique for efficiently tracing programs

Software—Practice & Experience
ATOM: a system for building customized program analysis tools

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Techniques for compressing program address traces

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Compiler-based prefetching for recursive data structures

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Procedure placement using temporal ordering information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Refining data flow information using infeasible paths

ESEC '97/FSE-5 Proceedings of the 6th European SOFTWARE ENGINEERING conference held jointly with the 5th ACM SIGSOFT international symposium on Foundations of software engineering
Improving data-flow analysis with path profiles

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Using generational garbage collection to implement cache-conscious data placement

Proceedings of the 1st international symposium on Memory management
Segregating heap objects by reference behavior and lifetime

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Cache-conscious structure definition

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Whole program paths

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Linear-Time, Incremental Hierarchy Inference for Compression

DCC '97 Proceedings of the Conference on Data Compression
Path Profile Guided Partial Dead Code Elimination Using Predication

PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Improving Cache Behavior of Dynamically Allocated Data Structures

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Controllable and scalable simulation for animation

Controllable and scalable simulation for animation

An efficient profile-analysis framework for data-layout optimizations

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Dynamic hot data stream prefetching for general-purpose programs

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Reconsidering custom memory allocation

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Online feedback-directed optimization of Java

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Predicting whole-program locality through reuse distance analysis

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Compactly representing parallel program executions

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Dynamic metrics for java

OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
A Framework to Capture Dynamic Data Structures in Pointer-Based Codes

IEEE Transactions on Parallel and Distributed Systems
Improving effective bandwidth through compiler enhancement of global cache reuse

Journal of Parallel and Distributed Computing
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior

IEEE Transactions on Computers
Exposing Memory Access Regularities Using Object-Relative Memory Profiling

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Cost effective dynamic program slicing

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Array regrouping and structure splitting using whole-program reference affinity

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Prefetch injection based on hardware monitoring and object metadata

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
VPC3: a fast and effective trace-compression algorithm

Proceedings of the joint international conference on Measurement and modeling of computer systems
Design space exploration of caches using compressed traces

Proceedings of the 18th annual international conference on Supercomputing
Locality phase prediction

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Whole Execution Traces

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Maintaining Consistency and Bounding Capacity of Software Code Caches

Proceedings of the international symposium on Code generation and optimization
Identifying and Exploiting Spatial Regularity in Data Memory References

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Fast data-locality profiling of native execution

SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Whole execution traces and their applications

ACM Transactions on Architecture and Code Optimization (TACO)
HPS: Hybrid Profiling Support

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A hierarchical model of data locality

Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Profiling over Adaptive Ranges

Proceedings of the International Symposium on Code Generation and Optimization
Cache-conscious coallocation of hot data streams

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Profile-guided proactive garbage collection for locality optimization

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
On the parallelization of irregular and dynamic programs

Parallel Computing
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies

ACM Transactions on Programming Languages and Systems (TOPLAS)
Shadow Profiling: Hiding Instrumentation Costs with Parallelism

Proceedings of the International Symposium on Code Generation and Optimization
Predicting locality phases for dynamic memory optimization

Journal of Parallel and Distributed Computing
Automatic software interference detection in parallel applications

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Formulating and implementing profiling over adaptive ranges

ACM Transactions on Architecture and Code Optimization (TACO)
Efficient program execution indexing

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Scalable Implementation of Efficient Locality Approximation

Languages and Compilers for Parallel Computing
Temporal instruction fetch streaming

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A component model of spatial locality

Proceedings of the 2009 international symposium on Memory management
Program locality analysis using reuse distance

ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient runtime tracking of allocation sites in Java

Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
A graph theoretic approach to cache-conscious placement of data for direct mapped caches

Proceedings of the 2010 international symposium on Memory management
Scalable Communication Trace Compression

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Memory Trace Compression and Replay for SPMD Systems using Extended PRSDs?

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Automatic optimization for MapReduce programs

Proceedings of the VLDB Endowment
Multi-objective optimization of dynamic memory managers using grammatical evolution

Proceedings of the 13th annual conference on Genetic and evolutionary computation
Simulation of high-performance memory allocators

Microprocessors & Microsystems
Data-Layout optimization using reuse distance distribution

EUC'06 Proceedings of the 2006 international conference on Emerging Directions in Embedded and Ubiquitous Computing
smt-SPRINTS: software precomputation with intelligent streaming for resource-constrained SMTs

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Trace-Based data layout optimizations for multi-core processors

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Improving data locality for irregular partitioned global address space parallel programs

Proceedings of the 50th Annual Southeast Regional Conference
Application data prefetching on the IBM blue gene/Q supercomputer

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Cachetor: detecting cacheable data to remove bloat

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
OOPSLA 2002: Reconsidering custom memory allocation

ACM SIGPLAN Notices - Supplemental issue
Linearizing irregular memory accesses for improved correlated prefetching

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the growing processor-memory performance gap, understanding and optimizing a program's reference locality, and consequently, its cache performance, is becoming increasingly important. Unfortunately, current reference locality optimizations rely on heuristics and are fairly ad-hoc. In addition, while optimization technology for improving instruction cache performance is fairly mature (though heuristic-based), data cache optimizations are still at an early stage. We believe the primary reason for this imbalance is the lack of a suitable representation of a program's dynamic data reference behavior and a quantitative basis for understanding this behavior.We address these issues by proposing a quantitative basis for understanding and optimizing reference locality, and by describing efficient data reference representations and an exploitable locality abstraction that support this framework. Our data reference representations (Whole Program Streams and Stream Flow Graphs) are compact—two to four orders of magnitude smaller than the program's data reference trace—and permit efficient analysis—on the order of seconds to a few minutes—even for complex applications. These representations can be used to efficiently compute our exploitable locality abstraction (hot data streams). We demonstrate that these representations and our hot data stream abstraction are useful for quantifying and exploiting data reference locality. We applied our framework to several SPECint 2000 benchmarks, a graphics program, and a commercial Microsoft database application. The results suggest significant opportunity for hot data stream-based locality optimizations.