Software Trace Cache for Commercial Applications
International Journal of Parallel Programming
IEEE Transactions on Computers
Hi-index | 0.00 |
Trace cache, an instruction fetch technique that reduces taken branch penalties by storing and fetching program instructions in dynamic execution order, dramatically improves instruction fetch bandwidth. Similarly, program transformations like loop unrolling, procedure in-lining, feedback-directed program restructuring, and profile-directed feedback can improve instruction fetch bandwidth by changing the static structure and ordering of a program's basic blocks. We examine the interaction of these compile-time and run-time techniques in the context of a high-quality production compiler that implements such transformations and a cycle-accurate simulation model of a wide issue super-scalar processor. Not surprisingly, we find that the relative benefit of adding trace cache declines with increasing optimization level, and vice versa. Furthermore, we find that certain optimizations that improve performance on a processor model without trace cache can actually degrade performance on a processor with trace cache due to increased branch history table interference. Finally, we show that the performance obtained with a trace cache of a given size can be obtained with a trace cache of about half the size by applying aggressive compiler optimization techniques.