MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Advanced compiler design and implementation
Advanced compiler design and implementation
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Control Flow Driven Splitting of Loop Nests at the Source Code Level
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
How compilers and tools differ for embedded systems
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Equivalence checking of arithmetic expressions using fast evaluation
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Efficient dynamic voltage/frequency scaling through algorithmic loop transformation
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Hi-index | 0.00 |
We present a novel loop transformation technique, particularly well suited for optimizing embedded compilers, where an increase in compilation time is acceptable in exchange for significant performance increase. The transformation technique optimizes loops containing nested conditional blocks. Specifically, the transformation takes advantage of the fact that the Boolean value of the conditional expression, determining the true/false paths, can be statically analyzed using a novel interval analysis technique that can evaluate conditional expressions in the general polynomial form. Results from interval analysis combined with loop dependency information is used to partition the iteration space of the nested loop. In such cases, the loop nest is decomposed such as to eliminate the conditional test, thus substantially reducing the execution time. Our technique completely eliminates the conditional from the loops (unlike previous techniques) thus further facilitating the application of other optimizations and improving the overall speedup. Applying the proposed transformation technique on loop kernels taken from Mediabench, SPEC-2000, mpeg4, qsdpcm and gimp, on average we measured a 175% (1.75X) improvement of execution time when running on a SPARC processor, a 336% (4.36X) improvement of execution time when running on an Intel Core Duo processor and a 198.9% (2.98X) improvement of execution time when running on a PowerPC G5 processor.