Lightweight reference affinity analysis

Authors:
Xipen Shen;Yaoqing Gao;Chen Ding;Roch Archambault
Affiliations:
University of Rochester, Rochester, NY;IBM Toronto Software Lab, Toronto, ON, Canada;University of Rochester, Rochester, NY;IBM Toronto Software Lab, Toronto, ON, Canada
Venue:
Proceedings of the 19th annual international conference on Supercomputing
Year:
2005

Citing 13
Cited 12

Determining average program execution times and their variance

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Accurate static estimators for program optimization

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Context-sensitive interprocedural points-to analysis in the presence of function pointers

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Reducing false sharing on shared memory multiprocessors through compile time data transformations

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
A quantitative analysis of loop nest locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Data transformations for eliminating conflict misses

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Automatic data layout for distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Constant propagation with conditional branches

POPL '85 Proceedings of the 12th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Improving effective bandwidth through compiler enhancement of global cache reuse

Journal of Parallel and Distributed Computing
Array regrouping and structure splitting using whole-program reference affinity

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation

Data layouts for object-oriented programs

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
An Efficient OpenMP Runtime System for Hierarchical Architectures

IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Abstracting access patterns of dynamic memory using regular expressions

ACM Transactions on Architecture and Code Optimization (TACO)
A component model of spatial locality

Proceedings of the 2009 international symposium on Memory management
Program locality analysis using reuse distance

ACM Transactions on Programming Languages and Systems (TOPLAS)
Instruction Hints for Super Efficient Data Caches

ICCS 2009 Proceedings of the 9th International Conference on Computational Science
Virtual reuse distance analysis of SPECjvm2008 data locality

PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
Layout transformations for heap objects using static access patterns

CC'07 Proceedings of the 16th international conference on Compiler construction
Array regrouping on CMP with non-uniform cache sharing

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Supporting cache locality optimization with a toolset

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Trace-Based data layout optimizations for multi-core processors

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Building portable thread schedulers for hierarchical multiprocessors: the bubblesched framework

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous studies have shown that array regrouping and structure splitting significantly improve data locality. The most effective technique relies on profiling every access to every data element. The high overhead impedes its adoption in a general compiler, In this paper, we show that for array regrouping in scientific programs, the overhead is not needed since the same benefit can be obtained by pure program analysis.We present an interprocedural analysis technique for array regrouping. For each global array, the analysis summarizes the access pattern by access-frequency vectors and then groups arrays with similar vectors. The analysis is context sensitive, so it tracks the exact array access. For each loop or function call, it uses two methods to estimate the frequency of the execution. The first is symbolic analysis in the compiler. The second is lightweight profiling of the code. The same interprocedural analysis is used to cumulate the overall execution frequency by considering the calling context. We implemented a prototype of both the compiler and the profiling analysis in the IBM® compiler, evaluated array regrouping on the entire set of SPEC CPU2000 FORTRAN benchmarks, and compared different analysis methods. The pure compiler-based array regrouping improves the performance for the majority of programs, leaving little room for improvement by code or data profiling.