Improving register allocation for subscripted variables

Authors:
David Callahan;Steve Carr;Ken Kennedy
Affiliations:
Cray, Inc., Seattle, WA;Michigan Technological University, Houghton, MI;Rice University, Houston, TX
Venue:
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Year:
2004

Citing 33
Cited 2

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Loop skewing: the wavefront method revisited

International Journal of Parallel Programming
Automatic decomposition of scientific programs for parallel execution

POPL '87 Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Estimating interlock and improving balance for pipelined architectures

Journal of Parallel and Distributed Computing
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Strategies for cache and local memory management by global program transformation

Proceedings of the 1st International Conference on Supercomputing
Coloring heuristics for register allocation

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Vector Register Allocation

IEEE Transactions on Computers
A practical data flow framework for array reference analysis and its use in optimizations

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Scalar replacement in the presence of conditional control flow

Software—Practice & Experience
Improving the ratio of memory operations to floating-point operations in loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
A solution to a problem with Morel and Renvoise's “Global optimization by suppression of partial redundancies”

ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Unroll-and-jam using uniformly generated sets

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Optimized unrolling of nested loops

Proceedings of the 14th international conference on Supercomputing
The parallel execution of DO loops

Communications of the ACM
Register allocation by priority-based coloring

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Automatic loop interchange

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Dependence graphs and compiler optimizations

POPL '81 Proceedings of the 8th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Structure of Computers and Computations

Structure of Computers and Computations
Iteration Space Tiling for Memory Hierarchies

Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Blocking Linear Algebra Codes for Memory Hierarchies

Proceedings of the Fourth SIAM Conference on Parallel Processing for Scientific Computing
Optimizing Loop Performance for Clustered VLIW Architectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Register allocation & spilling via graph coloring

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Loop Quantization: an Analysis and Algorithm

Loop Quantization: an Analysis and Algorithm
Combining Optimization for Cache and Instruction-Level Parallelism

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Improving the performance of virtual memory computers.

Improving the performance of virtual memory computers.
Optimizing supercompilers for supercomputers

Optimizing supercompilers for supercomputers
Software methods for improvement of cache performance on supercomputer applications

Software methods for improvement of cache performance on supercomputer applications

Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
On minimizing register usage of linearly scheduled algorithms with uniform dependencies

Computer Languages, Systems and Structures

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most conventional compilers fail to allocate array elements to registers because standard data-flow analysis treats arrays like scalars, making it impossible to analyze the definitions and uses of individual array elements. This deficiency is particularly troublesome for floating-point registers, which are most often used as temporary repositories for subscripted variables.In this paper, we present a source-to-source transformation, called scalar replacement, that finds opportunities for reuse of subscripted variables and replaces the references involved by references to temporary scalar variables. The objective is to increase the likelihood that these elements will be assigned to registers by the coloring-based register allocators found in most compilers. In addition, we present transformations to improve the overall effectiveness of scalar replacement and show how these transformations can be applied in a variety of loop nest types. Finally, we present experimental results showing that these techniques are extremely effective---capable of achieving integer factor speedups over code generated by good optimizing compilers of conventional design.