A study of a C function inliner
Software—Practice & Experience
A portable global optimizer and linker
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Inline function expansion for compiling C programs
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Quick compilers using peephole optimization
Software—Practice & Experience
Instruction scheduling beyond basic blocks
IBM Journal of Research and Development
Ease: an environment for architecture study and experimentation
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Journal of the ACM (JACM)
ACM Computing Surveys (CSUR)
Communications of the ACM
Measurement and analysis of instruction use in the VAX-11/780
ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
An instruction timing model of CPU performance
ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Isolation and analysis of optimization errors
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
VLIW compilation techniques in a superscalar environment
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Improving semi-static branch prediction by code replication
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Automatic isolation of compiler errors
ACM Transactions on Programming Languages and Systems (TOPLAS)
Improving the accuracy of static branch prediction using branch correlation
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Avoiding conditional branches by code replication
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Interprocedural conditional branch elimination
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Control CPR: a branch height reduction optimization for EPIC architectures
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
BPF+: exploiting global data-flow optimization in a generalized packet filter architecture
Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Handling irreducible loops: optimized node splitting versus DJ-graphs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Handling Irreducible Loops: Optimized Node Splitting vs. DJ-Graphs
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Automatic generation of peephole optimizations
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Computer algebra systems as mathematical optimizing compilers
Science of Computer Programming
Improving WCET by applying worst-case path optimizations
Real-Time Systems
Using Branch Correlation to Identify Infeasible Paths for Anomaly Detection
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
The interprocedural express-lane transformation
CC'03 Proceedings of the 12th international conference on Compiler construction
Hi-index | 0.01 |
This study evaluates a global optimization technique that avoids unconditional jumps by replicating code. When implemented in the back-end of an optimizing compiler, this technique can be generalized to work on almost all instances of unconditional jumps, including those generated from conditional statements and unstructured loops. The replication method is based on the idea of finding a replacement for each unconditional jump which minimizes the growth in code size. This is achieved by choosing the shortest sequence of instructions as a replacement. Measurements taken from a variety of programs showed that not only the number of executed instructions decreased, but also that the total cache work was reduced (except for small caches) despite increases in code size. Pipelined and superscalar machines may also benefit from an increase in the average basic block size.