Improving instruction-level parallelism by loop unrolling and dynamic memory disambiguation
Proceedings of the 28th annual international symposium on Microarchitecture
An overview of a compiler for mapping software binaries to hardware
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The Analysis of Windows Vista Disk Encryption Algorithm
Proceeedings of the 22nd annual IFIP WG 11.3 working conference on Data and Applications Security
Loop transformations in the ahead-of-time optimization of java bytecode
CC'06 Proceedings of the 15th international conference on Compiler Construction
Hi-index | 0.00 |
A well-known code transformation for improving the execution performance of a program is loop unrolling. The unrolled loop usually requires fewer instruction executions than the original loop. The reduction in instruction executions comes from two sources: the number of branch instructions executed is reduced, and the index variable is modified fewer times. In addition, for architectures with features designed to exploit instruction-level parallelism, loop unrolling can expose greater levels of instruction-level parallelism. Loop unrolling is an effective code transformation often improving the execution performance of programs that spend much of their execution time in loops by 10 to 30 percent. Unfortunately, it has not been studied as extensively as other code improvements such as register allocation or common subexpression elimination. The result is that many compilers employ simplistic loop unrolling algorithms that miss many opportunities for improving run-time performance. This paper describes how aggressive loop unrolling is done in a retargetable optimizing compiler. Using a set of 32 benchmark programs, the effectiveness of this more aggressive approach to loop unrolling is evaluated. Interactions between loop unrolling and other common optimizations are also discussed.