Evaluating trace cache energy efficiency

  • Authors:
  • Michele Co;Dee A. B. Weikle;Kevin Skadron

  • Affiliations:
  • University of Virginia, Charlottesville, Virginia;University of Virginia, Charlottesville, Virginia;University of Virginia, Charlottesville, Virginia

  • Venue:
  • ACM Transactions on Architecture and Code Optimization (TACO)
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Future fetch engines need to be energy efficient. Much research has focused on improving fetch bandwidth. In particular, previous research shows that storing concatenated basic blocks to form instruction traces can significantly improve fetch performance. This work evaluates whether this concatenating of basic blocks translates to significant energy-efficiency gains. We compare processor performance and energy efficiency in trace caches compared to instruction caches. We find that, although trace caches modestly outperform instruction cache only alternatives, it is branch-prediction accuracy that really determines performance and energy efficiency. When access delay and area restrictions are considered, our results show that sequential trace caches achieve very similar performance and energy efficiency results compared to instruction cache-based fetch engines and show that the trace cache's failure to significantly outperform the instruction cache-based fetch organizations stems from the poorer implicit branch prediction from the next-trace predictor at smaller areas. Because access delay limits the theoretical performance of the evaluated fetch engines, we also propose a novel ahead-pipelined next-trace predictor. Our results show that an STC fetch organization with a three-stage, ahead-pipelined next-trace predictor can achieve 5--17% IPC and 29% ED2 improvements over conventional, unpipelined organizations.