New tiling techniques to improve cache temporal locality
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An integer linear programming approach for optimizing cache locality
ICS '99 Proceedings of the 13th international conference on Supercomputing
A compiler technique for improving whole-program locality
POPL '01 Proceedings of the 28th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Static and Dynamic Locality Optimizations Using Integer Linear Programming
IEEE Transactions on Parallel and Distributed Systems
Integrating loop and data transformations for global optimization
Journal of Parallel and Distributed Computing
An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets
The Journal of Supercomputing
A Layout-Conscious Iteration Space Transformation Technique
IEEE Transactions on Computers
A Loop Transformation Algorithm Based on Explicit Data Layout Representation for Optimizing Locality
LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
High Performance Numerical Computing in Java: Language and Compiler Issues
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Using the Compiler to Improve Cache Replacement Decisions
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework
IEEE Transactions on Parallel and Distributed Systems
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
Java programming for high-performance numerical computing
IBM Systems Journal
Quasidynamic Layout Optimizations for Improving Data Locality
IEEE Transactions on Parallel and Distributed Systems
Automatic tiling of iterative stencil loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Generating cache hints for improved program efficiency
Journal of Systems Architecture: the EUROMICRO Journal
Improving whole-program locality using intra-procedural and inter-procedural transformations
Journal of Parallel and Distributed Computing
Multi-compilation: capturing interactions among concurrently-executing applications
Proceedings of the 3rd conference on Computing frontiers
Optimizing compiler for shared-memory multiple SIMD architecture
Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Hi-index | 0.00 |
Global locality analysis is a technique for improving the cache performance of a sequence of loop nests through a combination of loop and data layout optimizations. Pure loop transformations are restricted by data dependences and may not be very successful in optimizing imperfectly nested loops; the impact of a data transformation on an array might be program-wide. Therefore, in this paper we argue for a combined approach which employs both loop and data transformations. The method enjoys the advantages of the most of the previous techniques for enhancing locality and is efficient. In our approach, the loop nests are processed one by one and the data layout constraints obtained from one nest are propagated for optimization of the remaining loop nests. We show that this process can be put in a simple matrix framework which can be manipulated by an optimizing compiler. The search space that we consider for possible loop transformations comprises general non-singular linear transformation matrices and the data layouts that we consider are those which can be expressed using hyperplanes. Experiments using several programs on an SGI Origin 2000 distributed-shared-memory machine demonstrate the efficacy of our approach.