Load-reuse analysis: design and evaluation

Authors:
Rastislav Bodík;Rajiv Gupta;Mary Lou Soffa
Affiliations:
Dept. of Computer Science, University of Pittsburgh, Pittsburgh, PA;Dept. of Computer Science, University of Pittsburgh, Pittsburgh, PA;Dept. of Computer Science, University of Pittsburgh, Pittsburgh, PA
Venue:
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Year:
1999

Citing 34
Cited 22

Global value numbers and redundant computations

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The value flow graph: a program representation for optimal program transformations

Proceedings of the third European symposium on programming on ESOP '90
Improving register allocation for subscripted variables

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Efficiently computing static single assignment form and the control dependence graph

ACM Transactions on Programming Languages and Systems (TOPLAS)
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
A practical data flow framework for array reference analysis and its use in optimizations

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Effective partial redundancy elimination

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Scalar replacement in the presence of conditional control flow

Software—Practice & Experience
Optimal code motion: theory and practice

ACM Transactions on Programming Languages and Systems (TOPLAS)
The undecidability of aliasing

ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving the accuracy of static branch prediction using branch correlation

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Compiler optimizations for improving data locality

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Improving the ratio of memory operations to floating-point operations in loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Avoiding conditional branches by code replication

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Type feedback vs. concrete type inference: a comparison of optimization techniques for object-oriented languages

Proceedings of the tenth annual conference on Object-oriented programming systems, languages, and applications
Data flow frequency analysis

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Efficient path profiling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Array data flow analysis for load-store optimizations in fine-grain architectures

International Journal of Parallel Programming - Special issue: selected papers from the eighth international workshop on languages and compilers for parallel computing
Interprocedural conditional branch elimination

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Partial dead code elimination using slicing transformations

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Register promotion in C programs

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Resource-sensitive profile-directed data flow analysis for code optimization

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Refining data flow information using infeasible paths

ESEC '97/FSE-5 Proceedings of the 6th European SOFTWARE ENGINEERING conference held jointly with the 5th ACM SIGSOFT international symposium on Foundations of software engineering
Edge profiling versus path profiling: the showdown

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Path-sensitive value-flow analysis

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Complete removal of redundant expressions

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A new algorithm for scalar register promotion based on SSA form

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Register promotion by sparse partial redundancy elimination of loads and stores

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Type-based alias analysis

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Global optimization by suppression of partial redundancies

Communications of the ACM
Type Directed Cloning for Object-Oriented Programs

LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
Property-Oriented Expansion

SAS '96 Proceedings of the Third International Symposium on Static Analysis
Path-sensitive, value-flow optimizations of programs (program analysis)

Path-sensitive, value-flow optimizations of programs (program analysis)

Modular interprocedural pointer analysis using access paths: design, implementation, and evaluation

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
ABCD: eliminating array bounds checks on demand

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Load and store reuse using register file contents

ICS '01 Proceedings of the 15th international conference on Supercomputing
Timestamped whole program path representation and its applications

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
An efficient static analysis algorithm to detect redundant memory operations

Proceedings of the 2002 workshop on Memory system performance
A compiler framework for speculative analysis and optimizations

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Load Redundancy Removal through Instruction Reuse

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Reducing data cache energy consumption via cached load/store queue

Proceedings of the 2003 international symposium on Low power electronics and design
An experimental evaluation of scalar replacement on scientific benchmarks

Software—Practice & Experience
Extending Path Profiling across Loop Backedges and Procedure Boundaries

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Lazy code motion

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Complete removal of redundant expressions

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Applications of storage mapping optimization to register promotion

Proceedings of the 18th annual international conference on Supercomputing
A compiler framework for speculative optimizations

ACM Transactions on Architecture and Code Optimization (TACO)
Partial redundancy elimination for access expressions by speculative code motion

Software—Practice & Experience
Interprocedural Speculative Optimization of Memory Accesses to Global Variables

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Redundancy elimination revisited

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Limits for a feasible speculative trace reuse implementation

International Journal of High Performance Systems Architecture
Local redundant polymorphism query elimination

Proceedings of the 8th International Conference on the Principles and Practice of Programming in Java
Dynamic method to evaluate code optimization effectiveness

Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems
Ball-Larus path profiling across multiple loop iterations

Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Interprocedural strength reduction of critical sections in explicitly-parallel programs

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Load-reuse analysis finds instructions that repeatedly access the same memory location. This location can be promoted to a register, eliminating redundant loads by reusing the results of prior memory accesses. This paper develops a load-reuse analysis and designs a method for evaluating its precision.In designing the analysis, we aspire for completeness---the goal of exposing all reuse that can be harvested by a subsequent program transformation. For register promotion, a suitable transformation is partial redundancy elimination (PRE). To approach the ideal goal of PRE-completeness, the load-reuse analysis is phrased as a data-flow problem on a program representation that is path-sensitive, as it detects reuse even when it originates in a different instruction along each control flow path. Furthermore, the analysis is comprehensive, as it treats scalar, array and pointer-based loads uniformly.In evaluating the analysis, we compare it with an ideal analysis. By observing the run-time stream of memory references, we collect all PRE-exploitable reuse and treat it as the ideal analysis performance. To compare the (static) load-reuse analysis with the (dynamic) ideal reuse, we use an estimator algorithm that computes, given a data-flow solution and a program profile, the dynamic amount of reuse detected by the analysis. We developed a family of estimators that differ in how well they bound the profiling error inherent in the edge profile. By bounding the error, the estimators offer a precise and practical method for determining the run-time optimization benefit.Our experiments show that about 55% of loads executed in Spec95 exhibit reuse. Of those, our analysis exposes about 80%.