Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Estimating interlock and improving balance for pipelined architectures
Journal of Parallel and Distributed Computing
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Analysis of interprocedural side effects in a parallel programming environment
Proceedings of the 1st International Conference on Supercomputing
Strategies for cache and local memory management by global program transformation
Proceedings of the 1st International Conference on Supercomputing
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Eliminating false data dependences using the Omega test
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Enhanced modulo scheduling for loops with conditional branches
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A practical data flow framework for array reference analysis and its use in optimizations
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Interprocedural constant propagation: a study of jump function implementation
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Lifetime-sensitive modulo scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Scalar replacement in the presence of conditional control flow
Software—Practice & Experience
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
ACM Computing Surveys (CSUR)
ACM Transactions on Programming Languages and Systems (TOPLAS)
An integrated compilation and performance analysis environment for data parallel programs
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Load-reuse analysis: design and evaluation
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Data dependence analysis on multi-dimensional array references
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Read-after-read memory dependence prediction
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Global optimization by suppression of partial redundancies
Communications of the ACM
The parallel execution of DO loops
Communications of the ACM
Dependence Analysis for Supercomputing
Dependence Analysis for Supercomputing
Computer Methods for Mathematical Computations
Computer Methods for Mathematical Computations
Structure of Computers and Computations
Structure of Computers and Computations
Array Data Flow Analysis for Load-Store Optimizations in Superscalar Architectures
LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Compact multi-dimensional kernel extraction for register tiling
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Combined ILP and register tiling: analytical model and optimization framework
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
This paper describes our experiments comparing multiple scalar replacement algorithms to evaluate their effectiveness on entire scientific application benchmarks within the context of a production-level compiler. We investigate at what point aggressive scalar replacement becomes detrimental and which dependence tests are necessary to give scalar replacement enough information to be effective. As many commercial optimizing compilers may include some version of scalar replacement as an optimization, it is important to determine how aggressive these algorithms need to be.Previously, no study has examined 'how much' scalar replacement is sufficient and effective within the context of an existing highly optimizing compiler. Our experiments show that, on whole programs, simple algorithms and simple dependence analysis capture nearly all opportunities for scalar replacement found in scientific application benchmarks. While additional aggressiveness may lead to some performance gain in some individual loops, it also leads to performance degradation too often to be worth the risk when considering entire applications. Algorithms restricted to value reuse over at most one loop iteration and to fully redundant array references give the best results.Our experiment further shows that scalar replacement is not only an effective optimization, but also a feasible one for commercial optimizers since the simple algorithms are not computationally expensive. Based upon our findings, we conclude that scalar replacement ought to be a part of any highly optimizing compiler because of its low cost and significant potential gain.