Evaluating trace cache energy efficiency

Authors:
Michele Co;Dee A. B. Weikle;Kevin Skadron
Affiliations:
University of Virginia, Charlottesville, Virginia;University of Virginia, Charlottesville, Virginia;University of Virginia, Charlottesville, Virginia
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2006

Citing 34
Cited 2

A comprehensive instruction fetch mechanism for a processor supporting speculative execution

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

ICS '93 Proceedings of the 7th international conference on Supercomputing
Optimization of instruction fetch mechanisms for high issue rates

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Control flow prediction with tree-like subgraphs for superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
Multiple-block ahead branch predictors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Path-based next trace prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Alternative fetch and issue policies for the trace cache fetch mechanism

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Power and performance tradeoffs using various caching strategies

ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
A Trace Cache Microarchitecture and Evaluation

IEEE Transactions on Computers - Special issue on cache memory and related problems
The block-based trace cache

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A scalable front-end architecture for fast instruction delivery

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
The impact of delay on the design of branch predictors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Micro-operation cache: a power aware frontend for the variable instruction length ISA

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Performance characterization of a hardware mechanism for dynamic optimization

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Unified methodology for resolving power-performance tradeoffs at the microarchitectural and circuit levels

Proceedings of the 2002 international symposium on Low power electronics and design
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Drowsy instruction caches: leakage power reduction using dynamic voltage scaling and cache sub-bank prediction

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Fetching instruction streams

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Selecting long atomic traces for high coverage

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Reconsidering Complex Branch Predictors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Using Dynamic Branch Behavior for Power-Efficient Instruction Fetch

ISVLSI '03 Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI'03)
Temperature-aware microarchitecture

Proceedings of the 30th annual international symposium on Computer architecture
Parallelism in the front-end

Proceedings of the 30th annual international symposium on Computer architecture
Low cost instruction cache designs for tag comparison elimination

Proceedings of the 2003 international symposium on Low power electronics and design
Power-Efficient Trace Caches

Proceedings of the conference on Design, automation and test in Europe
Power Issues Related to Branch Prediction

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Design of a Predictive Filter Cache for Energy Savings in High Performance Processor Architectures

ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Expansion Caches For Superscalar Processors

Expansion Caches For Superscalar Processors
Power Awareness through Selective Dynamically Optimized Traces

Proceedings of the 31st annual international symposium on Computer architecture
Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation

ISPASS '03 Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software

Partial resolution for redundant operation table

Microprocessors & Microsystems
The Design and Evaluation of a Selective Way Based Trace Cache

APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Future fetch engines need to be energy efficient. Much research has focused on improving fetch bandwidth. In particular, previous research shows that storing concatenated basic blocks to form instruction traces can significantly improve fetch performance. This work evaluates whether this concatenating of basic blocks translates to significant energy-efficiency gains. We compare processor performance and energy efficiency in trace caches compared to instruction caches. We find that, although trace caches modestly outperform instruction cache only alternatives, it is branch-prediction accuracy that really determines performance and energy efficiency. When access delay and area restrictions are considered, our results show that sequential trace caches achieve very similar performance and energy efficiency results compared to instruction cache-based fetch engines and show that the trace cache's failure to significantly outperform the instruction cache-based fetch organizations stems from the poorer implicit branch prediction from the next-trace predictor at smaller areas. Because access delay limits the theoretical performance of the evaluated fetch engines, we also propose a novel ahead-pipelined next-trace predictor. Our results show that an STC fetch organization with a three-stage, ahead-pipelined next-trace predictor can achieve 5--17% IPC and 29% ED2 improvements over conventional, unpipelined organizations.