Aggressive function inlining: preventing loop blockings in the instruction cache

Authors:
Yosi Ben Asher;Omer Boehm;Daniel Citron;Gadi Haber;Moshe Klausner;Roy Levin;Yousef Shajrawi
Affiliations:
IBM Research Lab in Haifa, Israel and Computer Science Department, Haifa University, Haifa, Israel;IBM Research Lab in Haifa, Israel and Computer Science Department, Haifa University, Haifa, Israel;IBM Research Lab in Haifa, Israel and Computer Science Department, Haifa University, Haifa, Israel;IBM Research Lab in Haifa, Israel and Computer Science Department, Haifa University, Haifa, Israel;IBM Research Lab in Haifa, Israel and Computer Science Department, Haifa University, Haifa, Israel;IBM Research Lab in Haifa, Israel and Computer Science Department, Haifa University, Haifa, Israel;IBM Research Lab in Haifa, Israel and Computer Science Department, Haifa University, Haifa, Israel
Venue:
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Year:
2008

Citing 11
Cited 0

Procedure merging with instruction caches

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A fast and effective heuristic for the feedback arc set problem

Information Processing Letters
Aggressive inlining

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Using cache line coloring to perform aggressive procedure inlining

ACM SIGARCH Computer Architecture News - Special issue on interaction between compilers and computer architectures
A comparative study of static and profile-based heuristics for inlining

DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
An analysis of inline substitution for a structured programming language

Communications of the ACM
A Region-based Partial Inlining Algorithm for an ILP Optimizing Compiler

PDPTA '02 Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications - Volume 2
Optimization opportunities created by global data reordering

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Function inlining versus function cloning

ACM SIGPLAN Notices
Program improvement by the selective integration of procedure calls

Program improvement by the selective integration of procedure calls
Evaluating inlining techniques

Computer Languages

Quantified Score

Hi-index	0.00

Visualization

Abstract

Aggressive function inlining can lead to significant improvements in execution time. This potential is reduced by extensive instruction cache (Icache) misses caused by subsequent code expansion. It is very difficult to predict which inlinings cause Icache conflicts, as the exact location of code in the executable depends on completing the inlining first. In this work we propose a new method for selective inlining called "Icache Loop Blockings" (ILB). In ILB we only allow inlinings that do not create multiple inlined copies of the same function in hot execution cycles. This prevents any increase in the Icache footprint. This method is significantly more aggressive than previous ones, experiments show it is also better. Results on a server level processor and on an embedded CPU, running SPEC CINT2000, show an improvement of 10% in the execution time of the ILB scheme in comparison to other inlining methods. This was achieved without bloating the size of the hot code executed at any single point of execution, which is crucial for the embedded processor domain. We have also considered the synergy between code reordering and inlining focusing on how inlining can help code reordering. This aspect of inlining has not been studied in previous works.