A predictive decode filter cache for reducing power consumption in embedded processors

  • Authors:
  • Weiyu Tang;Arun Kejariwal;Alexander V. Veidenbaum;Alexandru Nicolau

  • Affiliations:
  • Center for Embedded Computer Systems, University of California, Irvine, Irvine, CA;Center for Embedded Computer Systems, University of California, Irvine, Irvine, CA;Center for Embedded Computer Systems, University of California, Irvine, Irvine, CA;Center for Embedded Computer Systems, University of California, Irvine, Irvine, CA

  • Venue:
  • ACM Transactions on Design Automation of Electronic Systems (TODAES)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

With advances in semiconductor technology, power management has increasingly become a very important design constraint in processor design. In embedded processors, instruction fetch and decode consume more than 40% of processor power. This calls for development of power minimization techniques for the fetch and decode stages of the processor pipeline. For this, filter cache has been proposed as an architectural extension for reducing the power consumption. A filter cache is placed between the CPU and the instruction cache (I-cache) to provide the instruction stream. A filter cache has the advantages of shorter access time and lower power consumption. However, the downside of a filter cache is a possible performance loss in case of cache misses. In this article, we present a novel technique---decode filter cache (DFC)---for minimizing power consumption with minimal performance impact. The DFC stores decoded instructions. Thus, a hit in the DFC eliminates instruction fetch and its subsequent decoding. The bypassing of both instruction fetch and decode reduces processor power. We present a runtime approach for predicting whether the next fetch source is present in the DFC. In case a miss is predicted, we reduce the miss penalty by accessing the I-cache directly. We propose to classify instructions as cacheable or noncacheable, depending on the decode width. For efficient use of the cache space, a sectored cache design is used for the DFC so that both cacheable and noncacheable instructions can coexist in the DFC sector. Experimental results show that the DFC reduces processor power by 34% on an average and our next fetch prediction mechanism reduces miss penalty by more than 91%.