Speculating to reduce unnecessary power consumption

  • Authors:
  • Enric Musoll

  • Affiliations:
  • Tidal Networks, Inc., San Jose, CA

  • Venue:
  • ACM Transactions on Embedded Computing Systems (TECS)
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The power consumption of current processors keeps increasing in spite of aggressive circuit design techniques and process shrinks. One of the reasons for this increase is the complexity of the microarchitecture required to achieve the performance that each processor generation demands. These techniques, such as branch prediction and on-chip level two caches, increase not only the power consumption of the committed instructions, but also the useless power associated with those block accesses that generate results that are not needed for the correct execution and commit of the instructions.In this work, the different accesses that a particular block receives are classified into four different components, based on whether the accesses are performed by instructions of the correct path or the wrong (mispredicted) path, and also based on whether the results of the accesses are needed or not for the correct execution of the instructions. Out of the four components, only one accounts for the useful accesses to the block, that is, accesses performed to correctly execute instructions that will be committed. The other three components account for the useless activity on the block. The simulations performed indicate that, if the useless power dissipation of a high-performance processor could be totally removed with no performance degradation, the overall processor power consumption would be reduced by as much as 65% compared to the same processor in which all the blocks are accessed every cycle.This work then proposes a microarchitectural technique that targets the reduction of the useless power dissipation. The technique consists of predicting whether the result of a particular block of logic will be useful in order to execute the instructions (no matter whether the instructions will be eventually committed or not). If it is predicted useless, then the block is disabled.A case example is presented where two blocks are predicted for low power: the on-chip L2 cache for instruction fetches and the branch target buffer (BTB). The IPC versus power-consumption design space is explored for a particular microprocessor architecture. Both the average and the peak power consumption are targeted. High-level estimations are done to show that it is plausible that the ideas described might produce a significant reduction in useless block accesses. As an example, 65% accesses to the L2 cache can be eliminated at a 0.2% IPC degradation, and about 5% accesses to the BTB can be saved at the penalty of 0.7% IPC reduction.