Improving trace cache effectiveness with branch promotion and trace packing

Authors:
Sanjay Jeram Patel;Marius Evers;Yale N. Patt
Affiliations:
Advanced Computer Architecture Laboratory, Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, Michigan;Advanced Computer Architecture Laboratory, Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, Michigan;Advanced Computer Architecture Laboratory, Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, Michigan
Venue:
Proceedings of the 25th annual international symposium on Computer architecture
Year:
1998

Citing 13
Cited 27

Checkpoint repair for out-of-order execution machines

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Hardware support for large atomic units in dynamically scheduled machines

MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Branch classification: a new mechanism for improving branch predictor performance

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A fill-unit approach to multiple instruction issue

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A comparative analysis of schemes for correlated branch prediction

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Improving CISC instruction decoding performance using a fill unit

Proceedings of the 28th annual international symposium on Microarchitecture
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Alternative fetch and issue policies for the trace cache fetch mechanism

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving the accuracy and performance of memory communication through renaming

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving Branch Prediction Accuracy by Reducing Pattern History Table Interference

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques

Putting the fill unit to work: dynamic optimizations for trace cache microprocessors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A Trace Cache Microarchitecture and Evaluation

IEEE Transactions on Computers - Special issue on cache memory and related problems
Evaluation of Design Options for the Trace Cache Fetch Mechanism

IEEE Transactions on Computers - Special issue on cache memory and related problems
Software trace cache

ICS '99 Proceedings of the 13th international conference on Supercomputing
A comparison of scalable superscalar processors

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Control independence in trace processors

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Completion time multiple branch prediction for enhancing trace cache performance

Proceedings of the 27th annual international symposium on Computer architecture
Increasing the size of atomic instruction blocks using control flow assertions

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Performance characterization of a hardware mechanism for dynamic optimization

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Branch Prediction Using Profile Data

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
A Comparative Study of Redundancy in Trace Caches (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Fetching instruction streams

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Selecting long atomic traces for high coverage

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
The Ultrascalar Processor-An Asymptotically Scalable Superscalar Microarchitecture

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Using Interaction Costs for Microarchitectural Bottleneck Analysis

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
From Sequences of Dependent Instructions to Functions: An Approach for Improving Performance without ILP or Speculation

Proceedings of the 31st annual international symposium on Computer architecture
Interaction cost and shotgun profiling

ACM Transactions on Architecture and Code Optimization (TACO)
Interaction Cost: For When Event Counts Just Don't Add Up

IEEE Micro
Improving trace cache hit rates using the sliding window fill mechanism and fill select table

MSP '04 Proceedings of the 2004 workshop on Memory system performance
Improving trace cache hit rates using the sliding window fill mechanism and fill select table

MSP '04 Proceedings of the 2004 workshop on Memory system performance
Energy-efficient and high-performance instruction fetch using a block-aware ISA

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
The instruction register file micro-architecture

Future Generation Computer Systems - Special issue: Parallel computing technologies
Block-aware instruction set architecture

ACM Transactions on Architecture and Code Optimization (TACO)
The instruction register file micro-architecture

Future Generation Computer Systems - Special issue: Parallel computing technologies
Do trace cache, value prediction and prefetching improve SMT throughput?

ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Improving instruction delivery with a block-aware ISA

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The increasing widths of superscalar processors are placing greater demands upon the fetch mechanism. The trace cache meets these demands by placing logically contiguous instructions in physically contiguous storage. As a result, the trace cache delivers instructions at a high rate by supplying multiple fetch blocks each cycle.In this paper, we examine two techniques to improve the number of instructions delivered each cycle by the trace cache. The first technique, branch promotion, dynamically converts strongly biased branches into branches with static predictions. Because these promoted branches require no dynamic prediction, the branch predictor suffers less from the negative effects of interference.Branch promotion unlocks the potential of the second technique: trace packing. With trace packing, trace segments are packed with as many instructions as will fit, without regard to naturally occurring fetch block boundaries. With both techniques, the effective fetch rate of the trace cache jumps up 17% over a trace cache which implements neither.On a machine where the execution engine has a very aggressive memory disambiguator, the performance of a machine using branch promotion and trace packing is on average 11% higher than a machine using neither technique.