Array regrouping and structure splitting using whole-program reference affinity

Authors:
Yutao Zhong;Maksim Orlovich;Xipeng Shen;Chen Ding
Affiliations:
University of Rochester;University of Rochester;University of Rochester;University of Rochester
Venue:
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Year:
2004

Citing 32
Cited 45

Strategies for cache and local memory management by global program transformation

Proceedings of the 1st International Conference on Supercomputing
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Implementation of a parallel unstructured Euler solver on shared and distributed memory architectures

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Reducing false sharing on shared memory multiprocessors through compile time data transformations

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Segregating heap objects by reference behavior and lifetime

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Cache-conscious data placement

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Automatic data layout for distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Cache-conscious structure layout

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Cache-conscious structure definition

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Procedure placement using temporal-ordering information

ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient representations and abstractions for quantifying and exploiting data reference locality

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
The hardness of cache conscious data placement

POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Clustering Algorithms

Clustering Algorithms
Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings

International Journal of Parallel Programming
Data remapping for design space optimization of embedded memory systems

ACM Transactions on Embedded Computing Systems (TECS)
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Reuse Distance-Based Cache Hint Selection

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
The Multi-Queue Replacement Algorithm for Second Level Buffer Caches

Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Compiler-directed run-time monitoring of program data access

Proceedings of the 2002 workshop on Memory system performance
Calculating stack distances efficiently

Proceedings of the 2002 workshop on Memory system performance
Compile-time composition of run-time data and iteration reorderings

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Predicting whole-program locality through reuse distance analysis

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Localizing Non-Affine Array References

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Cache management by the compiler

Cache management by the compiler
Miss Rate Prediction across All Program Inputs

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Improving effective bandwidth through compiler enhancement of global cache reuse

Journal of Parallel and Distributed Computing

Locality phase prediction

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Identifying opportunities for automatic remote field cloning

CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
Memory access analysis and optimization approaches on splay trees

LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
Lightweight reference affinity analysis

Proceedings of the 19th annual international conference on Supercomputing
Instruction Based Memory Distance Analysis and its Application

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
A hierarchical model of data locality

Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Practical Structure Layout Optimization and Advice

Proceedings of the International Symposium on Code Generation and Optimization
Intermediately executed code is the key to find refactorings that improve temporal data locality

Proceedings of the 3rd conference on Computing frontiers
Cache-conscious coallocation of hot data streams

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Whole-program optimization of global variable layout

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Feedback-directed memory disambiguation through store distance analysis

Proceedings of the 20th annual international conference on Supercomputing
Locality approximation using time

Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies

ACM Transactions on Programming Languages and Systems (TOPLAS)
Structure Layout Optimization for Multithreaded Programs

Proceedings of the International Symposium on Code Generation and Optimization
Data layouts for object-oriented programs

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Predicting locality phases for dynamic memory optimization

Journal of Parallel and Distributed Computing
Forma: A framework for safe automatic array reshaping

ACM Transactions on Programming Languages and Systems (TOPLAS)
Sampling-based program locality approximation

Proceedings of the 7th international symposium on Memory management
MPADS: memory-pooling-assisted data splitting

Proceedings of the 7th international symposium on Memory management
The formalism underlying EASYMAP: A precompiler for refinement-based exploration of hierarchical data organizations

Science of Computer Programming
Scalable Implementation of Efficient Locality Approximation

Languages and Compilers for Parallel Computing
Abstracting access patterns of dynamic memory using regular expressions

ACM Transactions on Architecture and Code Optimization (TACO)
A component model of spatial locality

Proceedings of the 2009 international symposium on Memory management
Program locality analysis using reuse distance

ACM Transactions on Programming Languages and Systems (TOPLAS)
A compile/run-time environment for the automatic transformation of linked list data structures

International Journal of Parallel Programming
Polymorphing Software by Randomizing Data Structure Layout

DIMVA '09 Proceedings of the 6th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Virtual reuse distance analysis of SPECjvm2008 data locality

PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Locality behavior of parallel and sequential algorithms for irregular graph problems

PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
A compiler framework for general memory layout optimizations targeting structures

Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Composition-based Cache simulation for structure reorganization

Journal of Systems Architecture: the EUROMICRO Journal
Layout transformations for heap objects using static access patterns

CC'07 Proceedings of the 16th international conference on Compiler construction
Workload characterization supporting the development of domain-specific compiler optimizations using decision trees for data mining

Proceedings of the 13th International Workshop on Software & Compilers for Embedded Systems
Improving MPI communication via data type fission

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Array regrouping on CMP with non-uniform cache sharing

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
High-performance lattice QCD for multi-core based parallel systems using a cache-friendly hybrid threaded-MPI approach

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A study on the locality behavior of minimum spanning tree algorithms

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Data-Layout optimization using reuse distance distribution

EUC'06 Proceedings of the 2006 international conference on Emerging Directions in Embedded and Ubiquitous Computing
Trace-Based data layout optimizations for multi-core processors

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Combining locality analysis with online proactive job co-scheduling in chip multiprocessors

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Is reuse distance applicable to data locality analysis on chip multiprocessors?

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Path-Based reuse distance analysis

CC'06 Proceedings of the 15th international conference on Compiler Construction
A study towards optimal data layout for GPU computing

Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Imbalanced cache partitioning for balanced data-parallel programs

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

While the memory of most machines is organized as a hierarchy, program data are laid out in a uniform address space. This paper defines a model of reference affinity, which measures how close a group of data are accessed together in a reference trace. It proves that the model gives a hierarchical partition of program data. At the top is the set of all data with the weakest affinity. At the bottom is each data element with the strongest affinity. Based on the theoretical model, the paper presents k-distance analysis, a practical test for the hierarchical affinity of source-level data. When used for array regrouping and structure splitting, k-distance analysis consistently outperforms data organizations given by the programmer, compiler analysis, frequency profiling, statistical clustering, and all other methods we have tried.