Data and computation transformations for multiprocessors
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
ICS '95 Proceedings of the 9th international conference on Supercomputing
Proceedings of the fifth workshop on I/O in parallel and distributed systems
IEEE Transactions on Parallel and Distributed Systems
Efficient Representation Scheme for Multidimensional Array Operations
IEEE Transactions on Computers
Improving the Performance of Out-of-Core Computations
ICPP '97 Proceedings of the international Conference on Parallel Processing
Memory System Support for Dynamic Cache Line Assembly
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Optimizing shared cache behavior of chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
We present a unified approach to locality optimization that employs both data and control transformations. Data transformations include changing the array layout in memory. Control transformations involve changing the execution order of programs. We have developed new techniques for compiler optimizations for distributed shared-memory machines, although the same techniques can be used for sequential machines with a memory hierarchy. .pp Our compiler optimizations are based on an algebraic representation of data mappings and a new data locality model. We present a pure data transformation algorithm and an algorithm unifying data and control transformations. While there has been much work on control transformations, the opportunities for data transformations have been largely neglected. In fact, data transformations have the advantage of being applicable to programs that cannot be optimized with control transformations. The unified algorithm, which performs data and control transformations simultaneously, offers improvement over optimizations obtained by applying data and control transformations separately. .pp The experimental results using a set of applications on a parallel machine show that the new optimizations improve performance significantly. These results are further analyzed using locality metrics with instrumentation and simulation.