Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Improving trace cache effectiveness with branch promotion and trace packing
Proceedings of the 25th annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Hi-index | 0.00 |
As superscalar processors become increasingly wide, it is inevitable that the large set of instructions to be fetched every cycle will span multiple noncontiguous basic blocks. The mechanism to fetch, align, and pass this set of instructions down the pipeline must do so as efficiently as possible. The concept of trace cache has emerged as the most promising technique to meet this high-bandwidth, low-latency fetch requirement. A new fill unit scheme, the Sliding Window Fill Mechanism, is proposed as a method to efficiently populate the trace cache. This method exploits trace continuity and identifies probable start regions to improve trace cache hit rate. Simulation yields a 7% average hit rate increase over the Rotenberg fill mechanism. When combined with branch promotion, trace cache hit rates experienced a 19% average increase along with a 17% average improvement in fetch bandwidth.