On Table Bandwidth and Its Update Delay for Value Prediction on Wide-Issue ILP Processors
IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Latency and energy aware value prediction for high-frequency processors
ICS '02 Proceedings of the 16th international conference on Supercomputing
On Augmenting Trace Cache for High-Bandwidth Value Prediction
IEEE Transactions on Computers
Enhancing memory level parallelism via recovery-free value prediction
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Detecting global stride locality in value streams
Proceedings of the 30th annual international symposium on Computer architecture
Scaling the issue window with look-ahead latency prediction
Proceedings of the 18th annual international conference on Supercomputing
Proceedings of the 31st annual international symposium on Computer architecture
Enhancing Memory-Level Parallelism via Recovery-Free Value Prediction
IEEE Transactions on Computers
Hi-index | 0.01 |
In this paper, we look at two issues, which could affect the performance of value prediction on wide-issue ILP processors. One is the large number of accesses to the value prediction tables needed in each machine cycle, and the other is the latency required to update stale values in the value prediction tables. We introduce a prediction value cache (PVC), which augments the instruction cache to hold the prediction values. Using the PVC, we not only can provide required bandwidth to access multiple prediction values needed in each machine cycle, but also allow us to decouple the value prediction from the critical path in the instruction fetch stage. We use a hybrid value predictor with dynamic classification to perform value prediction in the write back stage, and assume a realistic number of read/write ports, e.g. 2 read/write ports, with queues in their prediction tables. We found good performance for an 8-issue processor using simulations.We also found that, in an 8-Issue processor using SPECint95 benchmark programs, 36% of instructions will access the same value prediction table entry again within 5 cycles, and 22% of instructions will do that within 2 cycles. Unless the prediction tables can be quickly updated, especially for the Stride type and the Two-level type, those value predictions will get stale values and mostly result in mispredictions. We examine several schemes such as attaching an age counter and using speculative update to cope with the problem of delayed updates, but found them not as effective due to the latency required in dynamic classification. If such latency can be reduced, e.g. by using compiler analysis to determine access types at compiler time, the performance could be further improved.