Automatic loop interchange

Authors:
John R. Allen;Ken Kennedy
Affiliations:
Rice University, Houston, Texas;Rice University, Houston, Texas
Venue:
SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Year:
1984

Citing 6
Cited 60

The art of computer programming, volume 1 (3rd ed.): fundamental algorithms

The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
A Survey of Parallel Machine Organization and Programming

ACM Computing Surveys (CSUR)
Conversion of control dependence to data dependence

POPL '83 Proceedings of the 10th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Dependence graphs and compiler optimizations

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Fortran for the Texas Instruments ASC system

Proceedings of the conference on Programming languages and compilers for parallel and vector machines
Dependence analysis for subscripted variables and its application to program transformations

Dependence analysis for subscripted variables and its application to program transformations

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Programming in VS Fortran on the IBM 3090 for Maximum Vector Performance

Computer
Program Translation Via Abstraction and Reimplementation

IEEE Transactions on Software Engineering
A mechanism for efficient debugging of parallel programs

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Optimal loop parallelization

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Array expansion

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Impact of self-scheduling order on performance on multiprocessor systems

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Compiling techniques for first-order liner recurrences on a Vector computer

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Integrating noninterfering versions of programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
A mechanism for efficient debugging of parallel programs

PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
Integrating non-intering versions of programs

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Region Scheduling: An Approach for Detecting and Redistributing Parallelism

IEEE Transactions on Software Engineering
Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Automatic transformation of FORTRAN loops to reduce cache conflicts

ICS '91 Proceedings of the 5th international conference on Supercomputing
Experiences with data dependence abstractions

ICS '91 Proceedings of the 5th international conference on Supercomputing
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Subdomain dependence test for massive parallelism

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A dynamic scheduling method for irregular parallel programs

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Processor allocation and loop scheduling on multiprocessor computers

ICS '92 Proceedings of the 6th international conference on Supercomputing
Optimizing for parallelism and data locality

ICS '92 Proceedings of the 6th international conference on Supercomputing
Vector Register Allocation

IEEE Transactions on Computers
Non-unimodular transformations of nested loops

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Performance evaluation for various configuration of superscalar processors

ACM SIGARCH Computer Architecture News
Orchestrating interactions among parallel computations

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Controlling application grain size on a network of workstations

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Translation of serial recursive codes to parallel SIMD codes

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Block algorithms for sparse matrix computations on high performance workstations

ICS '96 Proceedings of the 10th international conference on Supercomputing
Software pipelining: a comparison and improvement

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
An Efficient Solution to the Cache Thrashing Problem Caused by True Data Sharing

IEEE Transactions on Computers
A Compiler Optimization Algorithm for Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
SMARTS: exploiting temporal locality and parallelism through vertical execution

ICS '99 Proceedings of the 13th international conference on Supercomputing
Improving memory hierarchy performance for irregular applications

ICS '99 Proceedings of the 13th international conference on Supercomputing
A global resource-constrained parallelization technique

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Optimizing memory usage in the polyhedral model

ACM Transactions on Programming Languages and Systems (TOPLAS)
Loop re-ordering and pre-fetching at run-time

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Improving the performance of DSM systems via compiler involvement

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings

International Journal of Parallel Programming
Synchronization and Communication Costs of Loop Partitioning on Shared-Memory Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
The Power Test for Data Dependence

IEEE Transactions on Parallel and Distributed Systems
I/O Granularity Transformations

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Increasing and Detecting Memory Address Congruence

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Analysis of Irregular Single-Indexed Array Accesses and Its Applications in Compiler Optimizations

CC '00 Proceedings of the 9th International Conference on Compiler Construction
Configware and morphware going mainstream

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Reconfigurable systems
Transforming Complex Loop Nests for Locality

The Journal of Supercomputing
The digital divide of computing

Proceedings of the 1st conference on Computing frontiers
Improving register allocation for subscripted variables

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Modeling message-passing programs with a Performance Evaluating Virtual Parallel Machine

Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Extracting queries by static analysis of transparent persistence

Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Maximize Parallelism Minimize Overhead for Nested Loops via Loop Striping

Journal of VLSI Signal Processing Systems
Register-Transfer Level Transformations for Low-Power Data-Paths

Integrated Computer-Aided Engineering
Forma: A framework for safe automatic array reshaping

ACM Transactions on Programming Languages and Systems (TOPLAS)
Program optimization carving for GPU computing

Journal of Parallel and Distributed Computing
Guidance of Loop Ordering for Reduced Memory Usage in Signal Processing Applications

Journal of Signal Processing Systems
Optimizing integrated application performance with cache-aware metascheduling

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
Loop striping: maximize parallelism for nested loops

EUC'06 Proceedings of the 2006 international conference on Embedded and Ubiquitous Computing
Optimizing database-backed applications with query synthesis

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation

Quantified Score

Hi-index	0.01

Visualization

Abstract

Parallel and vector machines are becoming increasingly important to many computation intensive applications. Effectively utilizing such architectures, particularly from sequential languages such as Fortran, has demanded increasingly sophisticated compilers. In general, a compiler needs to significantly reorder a program in order to generate code optimal for a specific architecture.Because DO loops typically control the execution of a number of statements, the order in which loops are executed can dramatically affect the performance of a machine on a particular section of code. In particular, loop interchange can often be used to enhance the performance of code on parallel or vector machines.Determining when loops may be safely and profitably interchanged requires a study of the data dependences in the program. This work discusses specific applications of that theory to loop interchange. This theory is described as it has been implemented in PFC (Parallel Fortran Converter) -- a program which attempts to uncover operations in sequential Fortran code that may be safely rewritten as vector operations.