Scalable Implementation of Efficient Locality Approximation

Authors:
Xipeng Shen;Jonathan Shaw
Affiliations:
Computer Science Department, The College of William and Mary, Williamsburg;Shaw Technologies, Inc., Tualatin, OR,
Venue:
Languages and Compilers for Parallel Computing
Year:
2008

Citing 18
Cited 3

The intrinsic bandwidth requirements of ordinary programs

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks

ACM Transactions on Computer Systems (TOCS)
A scalable cross-platform infrastructure for application performance tuning using hardware counters

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Efficient representations and abstractions for quantifying and exploiting data reference locality

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Reuse Distance-Based Cache Hint Selection

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Predicting whole-program locality through reuse distance analysis

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Cache management by the compiler

Cache management by the compiler
Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse

Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse
Compile-time performance prediction of scientific programs

Compile-time performance prediction of scientific programs
Array regrouping and structure splitting using whole-program reference affinity

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Cross-architecture performance predictions for scientific applications using parameterized models

Proceedings of the joint international conference on Measurement and modeling of computer systems
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Instruction Based Memory Distance Analysis and its Application

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
DEP: detailed execution profile

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Locality approximation using time

Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Miss Rate Prediction Across Program Inputs and Cache Configurations

IEEE Transactions on Computers
Evaluation techniques for storage hierarchies

IBM Systems Journal

Analysis and approximation of optimal co-scheduling on chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
All-window profiling and composable models of cache sharing

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Is reuse distance applicable to data locality analysis on chip multiprocessors?

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

As memory hierarchy becomes deeper and shared by more processors, locality increasingly determines system performance. As a rigorous and precise locality model, reuse distance has been used in program optimizations, performance prediction, memory disambiguation, and locality phase prediction. However, the high cost of measurement has been severely impeding its uses in scenarios requiring high efficiency, such as product compilers, performance debugging, run-time optimizations. We recently discovered the statistical connection between time and reuse distance, which led to an efficient way to approximate reuse distance using time. However, not exposed are some algorithmic and implementation techniques that are vital for the efficiency and scalability of the approximation model. This paper presents these techniques. It describes an algorithm that approximates reuse distance on arbitrary scales; it explains a portable scheme that employs memory controller to accelerate the measure of time distance; it uncovers the algorithm and proof of a trace generator that can facilitate various locality studies.