The block-based trace cache

Authors:
Bryan Black;Bohuslav Rychlik;John Paul Shen
Affiliations:
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA;Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA;Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA
Venue:
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Year:
1999

Citing 17
Cited 21

Improving the accuracy of dynamic branch prediction using branch correlation

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

ICS '93 Proceedings of the 7th international conference on Supercomputing
Enhancing instruction scheduling with a block-structured ISA

International Journal of Parallel Programming
Next cache line and set prediction

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Optimization of instruction fetch mechanisms for high issue rates

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic path-based branch correlation

Proceedings of the 28th annual international symposium on Microarchitecture
Multiple-block ahead branch predictors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Increasing the instruction fetch rate via block-structured instruction set architectures

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Path-based next trace prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Alternative fetch and issue policies for the trace cache fetch mechanism

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Calibration of Microprocessor Performance Models

Computer
The PowerPC 604 RISC microprocessor

IEEE Micro
The PowerPC User Instruction Set Architecture

IEEE Micro
Multiple Branch and Block Prediction

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
The Effects of Mispredicted-Path Execution on Branch Prediction Structures

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques

Completion time multiple branch prediction for enhancing trace cache performance

Proceedings of the 27th annual international symposium on Computer architecture
Instruction path coprocessors

Proceedings of the 27th annual international symposium on Computer architecture
PipeRench implementation of the instruction path coprocessor

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Micro-operation cache: a power aware frontend for the variable instruction length ISA

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Power reduction through work reuse

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Performance characterization of a hardware mechanism for dynamic optimization

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Performance Evaluation of Exception Handling in I/O Libraries

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
Selecting long atomic traces for high coverage

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Micro-operation cache: a power aware frontend for variable instruction length ISA

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
A low-complexity fetch architecture for high-performance superscalar processors

ACM Transactions on Architecture and Code Optimization (TACO)
Execution cache-based microarchitecture power-efficient superscalar processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Increased Scalability and Power Efficiency by Using Multiple Speed Pipelines

Proceedings of the 32nd annual international symposium on Computer Architecture
The instruction register file micro-architecture

Future Generation Computer Systems - Special issue: Parallel computing technologies
Trace Cache Sampling Filter

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Block-aware instruction set architecture

ACM Transactions on Architecture and Code Optimization (TACO)
Evaluating trace cache energy efficiency

ACM Transactions on Architecture and Code Optimization (TACO)
Trace cache sampling filter

ACM Transactions on Computer Systems (TOCS)
The instruction register file micro-architecture

Future Generation Computer Systems - Special issue: Parallel computing technologies
Trace Cache Miss Rate

International Journal of Modelling and Simulation
Minimal Multi-threading: Finding and Removing Redundant Instructions in Multi-threaded Processors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Do trace cache, value prediction and prefetching improve SMT throughput?

ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The trace cache is a recently proposed solution to achieving high instruction fetch bandwidth by buffering and reusing dynamic instruction traces. This work presents a new block-based trace cache implementation that can achieve higher IPC performance with more efficient storage of traces. Instead of explicitly storing instructions of a trace, pointers to blocks constituting a trace are stored in a much smaller trace table. The block-based trace cache renames fetch addresses at the basic block level and stores aligned blocks in a block cache. Traces are constructed by accessing the replicated block cache using block pointers from the trace table. Performance potential of the block-based trace cache is quantified and compared with perfect branch prediction and perfect fetch schemes. Comparing to the conventional trace cache, the block-based design can achieve higher IPC, with less impact on cycle time.Results: Using the SPECint95 benchmarks, a 16-wide realistic design of a block-based trace cache can improve performance 75% over a baseline design and to within 7% of a baseline design with perfect branch prediction. With idealized trace prediction, it is shown the block-based trace cache with an 1K-entry block cache achieves the same performance of the conventional trace cache with 32K entries.