Quasidynamic Layout Optimizations for Improving Data Locality

Authors:
Ismail Kadayif;Mahmut Kandemir
Affiliations:
IEEE;IEEE
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2004

Citing 44
Cited 1

Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
New CPU benchmark suites from SPEC

COMPCON '92 Proceedings of the thirty-seventh international conference on COMPCON
Optimizing for parallelism and data locality

ICS '92 Proceedings of the 6th international conference on Supercomputing
To copy or not to copy: a compile-time technique for assessing when data copying should be used to eliminate cache conflicts

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Cache interference phenomena

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Compiling for numa parallel machines

Compiling for numa parallel machines
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Automatic data layout for high performance Fortran

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A novel approach towards automatic data distribution

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Non-singular data transformations: definition, validity and applications

ICS '97 Proceedings of the 11th international conference on Supercomputing
Data transformations for eliminating conflict misses

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
An efficient uniform run-time scheme for mixed regular-irregular applications

ICS '98 Proceedings of the 12th international conference on Supercomputing
A hyperplane based approach for optimizing spatial locality in loop nests

ICS '98 Proceedings of the 12th international conference on Supercomputing
Advanced compiler design and implementation

Advanced compiler design and implementation
Improving locality using loop and data transformations in an integrated framework

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Precise miss analysis for program transformations with caches of arbitrary associativity

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Automatic data layout for distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving memory hierarchy performance for irregular applications

ICS '99 Proceedings of the 13th international conference on Supercomputing
An integer linear programming approach for optimizing cache locality

ICS '99 Proceedings of the 13th international conference on Supercomputing
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks

ACM Transactions on Computer Systems (TOCS)
Dynamic data distribution with control flow analysis

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Compiler-directed selection of dynamic memory layouts

Proceedings of the ninth international symposium on Hardware/software codesign
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
The Paradigm Compiler for Distributed-Memory Multicomputers

Computer
Application Performance on the MIT Alewife Machine

Computer
Compiling Communication-Efficient Programs for Massively Parallel Machines

IEEE Transactions on Parallel and Distributed Systems
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Automatic Partitioning of Data and Computations on Scalable Shared Memory Multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing
Reduction of Cache Coherence Overhead by Compiler Data Layout and Loop Transformation

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Automatic Selection of Dynamic Data Partitioning Schemes for Distributed-Memory Multicomputers

LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
A Matrix-Based Approach to the Global Locality Optimization Problem

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Integrating Loop and Data Transformations for Global Optimisation

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Localizing Non-Affine Array References

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Memory Hierarchy Management for Iterative Graph Structures

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Optimizing supercompilers for supercomputers

Optimizing supercompilers for supercomputers

Partial access conflict-relieving programmable address shuffler for parallel memory system in multi-core processor

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Compiler-directed locality optimization techniques are effective in reducing the number of cycles spent in off-chip memory accesses. Recently, methods have been developed that transform memory layouts of data structures at compile-time to improve spatial locality of nested loops beyond current control-centric (loop nest-based) optimizations. Most of these data-centric transformations use a single static (program-wide) memory layout for each array. A disadvantage of these static layout-based locality enhancement strategies is that they might fail to optimize codes that manipulate arrays which demand different layouts in different parts of the code. In this paper, we introduce a new approach which extends current static layout optimization techniques by associating different memory layouts with the same array in different parts of the code. We call this strategy "quasidynamic layout optimization.驴 In this strategy, the compiler determines memory layouts (for different parts of the code) at compile time, but layout conversions occur at runtime. We show that the possibility of dynamically changing memory layouts during the course of execution adds a new dimension to the data locality optimization problem. Our strategy employs a static layout optimizer module as a building block and, by repeatedly invoking it for different parts of the code, it checks whether runtime layout modifications bring additional benefits beyond static optimization. Our experiments indicate significant improvements in execution time over static layout-based locality enhancing techniques.