On Some Implementation Issues for Value Prediction on Wide-Issue ILP Processors

  • Authors:
  • Sang-Jeong Lee;Pen-Chung Yew

  • Affiliations:
  • -;-

  • Venue:
  • PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
  • Year:
  • 2000

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this paper, we look at two issues, which could affect the performance of value prediction on wide-issue ILP processors. One is the large number of accesses to the value prediction tables needed in each machine cycle, and the other is the latency required to update stale values in the value prediction tables. We introduce a prediction value cache (PVC), which augments the instruction cache to hold the prediction values. Using the PVC, we not only can provide required bandwidth to access multiple prediction values needed in each machine cycle, but also allow us to decouple the value prediction from the critical path in the instruction fetch stage. We use a hybrid value predictor with dynamic classification to perform value prediction in the write back stage, and assume a realistic number of read/write ports, e.g. 2 read/write ports, with queues in their prediction tables. We found good performance for an 8-issue processor using simulations.We also found that, in an 8-Issue processor using SPECint95 benchmark programs, 36% of instructions will access the same value prediction table entry again within 5 cycles, and 22% of instructions will do that within 2 cycles. Unless the prediction tables can be quickly updated, especially for the Stride type and the Two-level type, those value predictions will get stale values and mostly result in mispredictions. We examine several schemes such as attaching an age counter and using speculative update to cope with the problem of delayed updates, but found them not as effective due to the latency required in dynamic classification. If such latency can be reduced, e.g. by using compiler analysis to determine access types at compiler time, the performance could be further improved.