A model for hierarchical memory
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
The complexity of multiway cuts (extended abstract)
STOC '92 Proceedings of the twenty-fourth annual ACM symposium on Theory of computing
Region-based compilation: an introduction and motivation
Proceedings of the 28th annual international symposium on Microarchitecture
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cache-conscious data placement
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Automatic data layout for distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Procedure placement using temporal-ordering information
ACM Transactions on Programming Languages and Systems (TOPLAS)
Transforming loops to recursion for multi-level memory hierarchies
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
On the complexity of loop fusion
Parallel Computing - Special issue on new trends on scheduling in parallel and distributed systems
Data mining: concepts and techniques
Data mining: concepts and techniques
Efficient representations and abstractions for quantifying and exploiting data reference locality
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
The hardness of cache conscious data placement
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
International Journal of Parallel Programming
Reuse Distance-Based Cache Hint Selection
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Compile-time composition of run-time data and iteration reorderings
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Cache management by the compiler
Cache management by the compiler
Miss Rate Prediction across All Program Inputs
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
Array regrouping and structure splitting using whole-program reference affinity
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Cross-architecture performance predictions for scientific applications using parameterized models
Proceedings of the joint international conference on Measurement and modeling of computer systems
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Metrics and models for reordering transformations
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Improving the computational intensity of unstructured mesh applications
Proceedings of the 19th annual international conference on Supercomputing
Instruction Based Memory Distance Analysis and its Application
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Data layouts for object-oriented programs
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Online Phase-Adaptive Data Layout Selection
ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Exploiting Spatial Locality for Objects Layout in Virtual Environments
PCM '08 Proceedings of the 9th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
A component model of spatial locality
Proceedings of the 2009 international symposium on Memory management
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
WSEAS Transactions on Information Science and Applications
Optimizing shared cache behavior of chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Cache topology aware computation mapping for multicores
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
A graph theoretic approach to cache-conscious placement of data for direct mapped caches
Proceedings of the 2010 international symposium on Memory management
Program phase detection and exploitation
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Studying inter-core data reuse in multicores
Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Studying inter-core data reuse in multicores
ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Discovery of locality-improving refactorings by reuse path analysis
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Neighborhood-aware data locality optimization for NoC-based multicores
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Reshaping cache misses to improve row-buffer locality in multicore systems
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
In POPL 2002, Petrank and Rawitz showed a universal result---finding optimal data placement is not only NP-hard but also impossible to approximate within a constant factor if P ≠ NP. Here we study a recently published concept called reference affinity, which characterizes a group of data that are always accessed together in computation. On the theoretical side, we give the complexity for finding reference affinity in program traces, using a novel reduction that converts the notion of distance into satisfiability. We also prove that reference affinity automatically captures the hierarchical locality in divide-and-conquer computations including matrix solvers and N-body simulation. The proof establishes formal links between computation patterns in time and locality relations in space.On the practical side, we show that efficient heuristics exist. In particular, we present a sampling method and show that it is more effective than the previously published technique, especially for data that are often but not always accessed together. We show the effect on generated and real traces. These theoretical and empirical results demonstrate that effective data placement is still attainable in general-purpose programs because common (albeit not all) locality patterns can be precisely modeled and efficiently analyzed.