An experimental evaluation of scalar replacement on scientific benchmarks

Authors:
Steve Carr;Philip Sweany
Affiliations:
Department of Computer Science, Michigan Technological University, Houghton, MI;Texas Instruments, P.O. Box 660199, MS/8649, Dallas, TX
Venue:
Software—Practice & Experience
Year:
2003

Citing 27
Cited 2

Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
Estimating interlock and improving balance for pipelined architectures

Journal of Parallel and Distributed Computing
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Analysis of interprocedural side effects in a parallel programming environment

Proceedings of the 1st International Conference on Supercomputing
Strategies for cache and local memory management by global program transformation

Proceedings of the 1st International Conference on Supercomputing
Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Practical dependence testing

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Eliminating false data dependences using the Omega test

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Enhanced modulo scheduling for loops with conditional branches

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A practical data flow framework for array reference analysis and its use in optimizations

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Interprocedural constant propagation: a study of jump function implementation

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Reverse If-Conversion

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Scalar replacement in the presence of conditional control flow

Software—Practice & Experience
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Software pipelining

ACM Computing Surveys (CSUR)
A solution to a problem with Morel and Renvoise's “Global optimization by suppression of partial redundancies”

ACM Transactions on Programming Languages and Systems (TOPLAS)
An integrated compilation and performance analysis environment for data parallel programs

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Load-reuse analysis: design and evaluation

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Data dependence analysis on multi-dimensional array references

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Read-after-read memory dependence prediction

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Global optimization by suppression of partial redundancies

Communications of the ACM
The parallel execution of DO loops

Communications of the ACM
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
Computer Methods for Mathematical Computations

Computer Methods for Mathematical Computations
Structure of Computers and Computations

Structure of Computers and Computations
Array Data Flow Analysis for Load-Store Optimizations in Superscalar Architectures

LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing

Compact multi-dimensional kernel extraction for register tiling

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Combined ILP and register tiling: analytical model and optimization framework

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes our experiments comparing multiple scalar replacement algorithms to evaluate their effectiveness on entire scientific application benchmarks within the context of a production-level compiler. We investigate at what point aggressive scalar replacement becomes detrimental and which dependence tests are necessary to give scalar replacement enough information to be effective. As many commercial optimizing compilers may include some version of scalar replacement as an optimization, it is important to determine how aggressive these algorithms need to be.Previously, no study has examined 'how much' scalar replacement is sufficient and effective within the context of an existing highly optimizing compiler. Our experiments show that, on whole programs, simple algorithms and simple dependence analysis capture nearly all opportunities for scalar replacement found in scientific application benchmarks. While additional aggressiveness may lead to some performance gain in some individual loops, it also leads to performance degradation too often to be worth the risk when considering entire applications. Algorithms restricted to value reuse over at most one loop iteration and to fully redundant array references give the best results.Our experiment further shows that scalar replacement is not only an effective optimization, but also a feasible one for commercial optimizers since the simple algorithms are not computationally expensive. Based upon our findings, we conclude that scalar replacement ought to be a part of any highly optimizing compiler because of its low cost and significant potential gain.