Using cache line coloring to perform aggressive procedure inlining

Authors:
Hakan Aydin;David Kaeli
Affiliations:
Department of Electrical and Computer Engineering, Northeastern University, Boston, MA;Department of Electrical and Computer Engineering, Northeastern University, Boston, MA
Venue:
ACM SIGARCH Computer Architecture News - Special issue on interaction between compilers and computer architectures
Year:
2000

Citing 0
Cited 4

Thread coloring: a scheduler proposal from user to hardware threads

ACM SIGOPS Operating Systems Review
Enlarging Instruction Streams

IEEE Transactions on Computers
Aggressive function inlining: preventing loop blockings in the instruction cache

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Combining code reordering and cache configuration

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Memory hierarchy performance has always been an important issue in computer architecture design. The likelihood of a bottleneck in the memory hierarchy is increasing, as improvements in microprocessor performance continue to outpace those made in the memory system. As a result, effective utilization of cache memories is essential in today's architectures.The nature of procedural software poses visibility problems when attempting to perform program optimization. One approach to increasing visibility in procedural design is to perform procedure inlining. The main downside of using inlining is that inlined procedures can place excess pressure on the instruction cache.To address this issue we attempt to perform code reordering. By combining reordering with aggressive inlining, a larger executable image produced through inlining can be effectively remapped onto the cache address space, while not noticeably increasing the instruction cache miss rate.In this paper, we evaluate our ability to perform aggressive inlining by employing cache line coloring. We have implemented three variations of our coloring algorithm in the Alto toolset and compare them against Alto's aggressive basic block reordering algorithms. Alto allows us to generate optimized executables, that can be run on hardware to generate results. We find that by using our algorithms, we can achieve up a 21% reduction is execution runtime over the base Compaq optimizing compiler, and a 6.4% reduction when compared to Alto's interprocedural basic block reordering algorithm.