Delinearization: an efficient way to break multiloop dependence equations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Optimal evaluation of array expressions on massively parallel machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Omega Library interface guide
The Omega Library interface guide
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Automatic data layout for high performance Fortran
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A novel approach towards automatic data distribution
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Combining loop transformations considering caches and scheduling
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Data distribution support on distributed shared memory multiprocessors
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
A compiler algorithm for optimizing locality in loop nests
ICS '97 Proceedings of the 11th international conference on Supercomputing
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Optimizing direct threaded code by selective inlining
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A hyperplane based approach for optimizing spatial locality in loop nests
ICS '98 Proceedings of the 12th international conference on Supercomputing
Memory size estimation for multimedia applications
Proceedings of the 6th international workshop on Hardware/software codesign
Advanced compiler design and implementation
Advanced compiler design and implementation
Improving locality using loop and data transformations in an integrated framework
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Schedule-independent storage mapping for loops
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Precise miss analysis for program transformations with caches of arbitrary associativity
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
An integer linear programming approach for optimizing cache locality
ICS '99 Proceedings of the 13th international conference on Supercomputing
Exact memory size estimation for array computations without loop unrolling
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
A compiler technique for improving whole-program locality
POPL '01 Proceedings of the 28th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Dynamic data distribution with control flow analysis
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Data locality enhancement by memory reduction
ICS '01 Proceedings of the 15th international conference on Supercomputing
A unified framework for schedule and storage optimization
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Loop fusion for memory space optimization
Proceedings of the 14th international symposium on Systems synthesis
A precise inter-procedural data flow algorithm
POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Compiling Communication-Efficient Programs for Massively Parallel Machines
IEEE Transactions on Parallel and Distributed Systems
Compile-Time Techniques for Data Distribution in Distributed Memory Machines
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Automatic Partitioning of Data and Computations on Scalable Shared Memory Multiprocessors
ICPP '97 Proceedings of the international Conference on Parallel Processing
Automatic Selection of Dynamic Data Partitioning Schemes for Distributed-Memory Multicomputers
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Proceedings of the 40th annual Design Automation Conference
Interprocedural Array Remapping
PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
A Matrix-Based Approach to the Global Locality Optimization Problem
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Integrating Loop and Data Transformations for Global Optimisation
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Efficient Parallelization using Combined Loop and Data Transformations
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Data Relation Vectors: A New Abstraction for Data Optimizations
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Loop transformations for reducing data space requirements of resource-constrained applications
SAS'03 Proceedings of the 10th international conference on Static analysis
A methodology for procedure cloning
Computer Languages
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Hi-index | 0.00 |
Exploiting spatial and temporal locality is essential for obtaining high performance on modern computers. Writing programs that exhibit high locality of reference is difficult and error-prone. Compiler researchers have developed loop transformations that allow the conversion of programs to exploit locality. Recently, transformations that change the memory layouts of multi-dimensional arrays-called data transformations-have been proposed. Unfortunately, both data and loop transformations have some important drawbacks. In this work, we present an integrated framework that uses loop and data transformations in concert to exploit the benefits of both approaches while minimizing the impact of their disadvantages. Our approach works inter-procedurally on acyclic call graphs, uses profile data to eliminate layout conflicts, and is unique in its capability of resolving conflicting layout requirements of different references to the same array in the same nest and in different nests for regular array-based applications. The optimization technique presented in this paper has been implemented in a source-to-source translator. We evaluate its performance using standard benchmark suites and several math libraries (complete programs) with large input sizes. Our experimental results show that the proposed approach improves the performance of the applications optimized by using the current state-of-the-art techniques by 8.2% on the average. This reduction comes from three important characteristics of the technique, namely, resolving layout conflicts between references to the same array in a loop nest, determining a suitable order to propagate layout modifications across loop nests, and propagating layouts between different procedures in the program-all in a unified framework. The locality optimization technique presented in this paper tries to exploit locality in the innermost loop positions. This strategy, in most cases, generates dependence-free outer loops, which can be safely parallelized.