Checkpoint repair for out-of-order execution machines
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Hardware support for large atomic units in dynamically scheduled machines
MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Branch classification: a new mechanism for improving branch predictor performance
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A fill-unit approach to multiple instruction issue
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A comparative analysis of schemes for correlated branch prediction
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Improving CISC instruction decoding performance using a fill unit
Proceedings of the 28th annual international symposium on Microarchitecture
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Dynamic speculation and synchronization of data dependences
Proceedings of the 24th annual international symposium on Computer architecture
Alternative fetch and issue policies for the trace cache fetch mechanism
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving the accuracy and performance of memory communication through renaming
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving Branch Prediction Accuracy by Reducing Pattern History Table Interference
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A Trace Cache Microarchitecture and Evaluation
IEEE Transactions on Computers - Special issue on cache memory and related problems
Evaluation of Design Options for the Trace Cache Fetch Mechanism
IEEE Transactions on Computers - Special issue on cache memory and related problems
ICS '99 Proceedings of the 13th international conference on Supercomputing
A comparison of scalable superscalar processors
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Control independence in trace processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Completion time multiple branch prediction for enhancing trace cache performance
Proceedings of the 27th annual international symposium on Computer architecture
Increasing the size of atomic instruction blocks using control flow assertions
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Performance characterization of a hardware mechanism for dynamic optimization
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Branch Prediction Using Profile Data
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
A Comparative Study of Redundancy in Trace Caches (Research Note)
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Selecting long atomic traces for high coverage
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
The Ultrascalar Processor-An Asymptotically Scalable Superscalar Microarchitecture
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Using Interaction Costs for Microarchitectural Bottleneck Analysis
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Proceedings of the 31st annual international symposium on Computer architecture
Interaction cost and shotgun profiling
ACM Transactions on Architecture and Code Optimization (TACO)
Improving trace cache hit rates using the sliding window fill mechanism and fill select table
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Improving trace cache hit rates using the sliding window fill mechanism and fill select table
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Energy-efficient and high-performance instruction fetch using a block-aware ISA
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
The instruction register file micro-architecture
Future Generation Computer Systems - Special issue: Parallel computing technologies
Block-aware instruction set architecture
ACM Transactions on Architecture and Code Optimization (TACO)
The instruction register file micro-architecture
Future Generation Computer Systems - Special issue: Parallel computing technologies
Do trace cache, value prediction and prefetching improve SMT throughput?
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Improving instruction delivery with a block-aware ISA
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
The increasing widths of superscalar processors are placing greater demands upon the fetch mechanism. The trace cache meets these demands by placing logically contiguous instructions in physically contiguous storage. As a result, the trace cache delivers instructions at a high rate by supplying multiple fetch blocks each cycle.In this paper, we examine two techniques to improve the number of instructions delivered each cycle by the trace cache. The first technique, branch promotion, dynamically converts strongly biased branches into branches with static predictions. Because these promoted branches require no dynamic prediction, the branch predictor suffers less from the negative effects of interference.Branch promotion unlocks the potential of the second technique: trace packing. With trace packing, trace segments are packed with as many instructions as will fit, without regard to naturally occurring fetch block boundaries. With both techniques, the effective fetch rate of the trace cache jumps up 17% over a trace cache which implements neither.On a machine where the execution engine has a very aggressive memory disambiguator, the performance of a machine using branch promotion and trace packing is on average 11% higher than a machine using neither technique.