On Some Implementation Issues for Value Prediction on Wide-Issue ILP Processors

Authors:
Sang-Jeong Lee;Pen-Chung Yew
Affiliations:
-;-
Venue:
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Year:
2000

Citing 0
Cited 8

On Table Bandwidth and Its Update Delay for Value Prediction on Wide-Issue ILP Processors

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Latency and energy aware value prediction for high-frequency processors

ICS '02 Proceedings of the 16th international conference on Supercomputing
On Augmenting Trace Cache for High-Bandwidth Value Prediction

IEEE Transactions on Computers
Enhancing memory level parallelism via recovery-free value prediction

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Detecting global stride locality in value streams

Proceedings of the 30th annual international symposium on Computer architecture
Scaling the issue window with look-ahead latency prediction

Proceedings of the 18th annual international conference on Supercomputing
A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy

Proceedings of the 31st annual international symposium on Computer architecture
Enhancing Memory-Level Parallelism via Recovery-Free Value Prediction

IEEE Transactions on Computers

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we look at two issues, which could affect the performance of value prediction on wide-issue ILP processors. One is the large number of accesses to the value prediction tables needed in each machine cycle, and the other is the latency required to update stale values in the value prediction tables. We introduce a prediction value cache (PVC), which augments the instruction cache to hold the prediction values. Using the PVC, we not only can provide required bandwidth to access multiple prediction values needed in each machine cycle, but also allow us to decouple the value prediction from the critical path in the instruction fetch stage. We use a hybrid value predictor with dynamic classification to perform value prediction in the write back stage, and assume a realistic number of read/write ports, e.g. 2 read/write ports, with queues in their prediction tables. We found good performance for an 8-issue processor using simulations.We also found that, in an 8-Issue processor using SPECint95 benchmark programs, 36% of instructions will access the same value prediction table entry again within 5 cycles, and 22% of instructions will do that within 2 cycles. Unless the prediction tables can be quickly updated, especially for the Stride type and the Two-level type, those value predictions will get stale values and mostly result in mispredictions. We examine several schemes such as attaching an age counter and using speculative update to cope with the problem of delayed updates, but found them not as effective due to the latency required in dynamic classification. If such latency can be reduced, e.g. by using compiler analysis to determine access types at compiler time, the performance could be further improved.