Data Relation Vectors: A New Abstraction for Data Optimizations

Authors:
Mahmut Kendemir;J. Ramanujam
Affiliations:
Pennsylvania State Univ., University Park;Louisiana State Univ., Baton Rougue
Venue:
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Year:
2001

Citing 41
Cited 3

Stencils and problem partitionings: their influence on the performance of multiple processor systems

IEEE Transactions on Computers
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems

Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A methodology for parallelizing programs for multicomputers and complex memory multiprocessors

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Fortran at ten gigaflops: the connection machine convolution compiler

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Optimizing for parallelism and data locality

ICS '92 Proceedings of the 6th international conference on Supercomputing
Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Compiling for numa parallel machines

Compiling for numa parallel machines
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
The Omega Library interface guide

The Omega Library interface guide
Automatic data layout for high performance Fortran

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A novel approach towards automatic data distribution

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic optimization of communication in compiling out-of-core stencil codes

ICS '96 Proceedings of the 10th international conference on Supercomputing
Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Data-centric multi-level blocking

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Data transformations for eliminating conflict misses

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Evaluating MMX technology using DSP and multimedia applications

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Improving locality using loop and data transformations in an integrated framework

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Improving Cache Locality by a Combination of Loop and Data Transformations

IEEE Transactions on Computers - Special issue on cache memory and related problems
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts

IEEE Transactions on Parallel and Distributed Systems
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving memory hierarchy performance for irregular applications

ICS '99 Proceedings of the 13th international conference on Supercomputing
Compiler algorithms for optimizing locality and parallelism on shared and distributed-memory machines

Journal of Parallel and Distributed Computing
Dynamic data distribution with control flow analysis

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Performance analysis using the MIPS R10000 performance counters

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Image and Video Compression Standards: Algorithms and Architectures

Image and Video Compression Standards: Algorithms and Architectures
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Compiling Communication-Efficient Programs for Massively Parallel Machines

IEEE Transactions on Parallel and Distributed Systems
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Automatic Partitioning of Data and Computations on Scalable Shared Memory Multiprocessors

ICPP '97 Proceedings of the international Conference on Parallel Processing
Reuse-Driven Tiling for Data Locality

LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
Integrating Loop and Data Transformations for Global Optimisation

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Localizing Non-Affine Array References

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Automatic Blocking of Nested Loops

Automatic Blocking of Nested Loops
Automatic computation and data decomposition for multiprocessors

Automatic computation and data decomposition for multiprocessors

Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles

IEEE Transactions on Parallel and Distributed Systems
Locality optimization in wireless applications

CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
SARA: StreAm register allocation

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an abstraction, called data relation vectors, to improve the data access characteristics and memory layouts in regular computations. The key idea is to define a relation between the data elements accessed by close-by iterations and use this relation to guide to a number of optimizations for array-based computations. The specific optimizations studied in this paper include enhancing group-spatial and self-spatial reuses and improving intratile and intertile data reuses. In addition, this abstraction works well with other known abstractions such as data reuse vectors. We also present a unified scheme for optimizing the memory performance of programs using this new abstraction in conjunction with reuse vectors. The data relation vector abstraction has been implemented in the SUIF compilation framework and has been tested using a set of 12 benchmarks from image processing and scientific computation domains. Preliminary results on a superscalar processor show that it is successful in reducing compilation time and outperforms two previously proposed techniques, one that uses only loop transformations and one that uses both loop and data transformations. Our experiments also show that the proposed abstraction helps one to select good data tile shapes which can subsequently be used to determine iteration space tiles.