HPS, a new microarchitecture: rationale and introduction
MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Checkpoint repair for out-of-order execution machines
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Hardware support for large atomic units in dynamically scheduled machines
MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Branch classification: a new mechanism for improving branch predictor performance
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A fill-unit approach to multiple instruction issue
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving CISC instruction decoding performance using a fill unit
Proceedings of the 28th annual international symposium on Microarchitecture
Integrating a misprediction recovery cache (MRC) into a superscalar pipeline
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Assigning confidence to conditional branch predictions
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Performance benefits of large execution atomic units in dynamically scheduled machines
ICS '89 Proceedings of the 3rd international conference on Supercomputing
The effect of instruction fetch bandwidth on value prediction
Proceedings of the 25th annual international symposium on Computer architecture
Improving trace cache effectiveness with branch promotion and trace packing
Proceedings of the 25th annual international symposium on Computer architecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A Trace Cache Microarchitecture and Evaluation
IEEE Transactions on Computers - Special issue on cache memory and related problems
Evaluation of Design Options for the Trace Cache Fetch Mechanism
IEEE Transactions on Computers - Special issue on cache memory and related problems
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ICS '99 Proceedings of the 13th international conference on Supercomputing
A time-stamping algorithm for efficient performance estimation of superscalar processors
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Micro-operation cache: a power aware frontend for the variable instruction length ISA
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
SMT Layout Overhead and Scalability
IEEE Transactions on Parallel and Distributed Systems
Boosting trace cache performance with nonhead miss speculation
ICS '02 Proceedings of the 16th international conference on Supercomputing
Increasing processor performance by implementing deeper pipelines
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Software Trace Cache for Commercial Applications
International Journal of Parallel Programming
On Augmenting Trace Cache for High-Bandwidth Value Prediction
IEEE Transactions on Computers
Speculative Clustered Caches for Clustered Processors
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Selecting long atomic traces for high coverage
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Effective ahead pipelining of instruction block address generation
Proceedings of the 30th annual international symposium on Computer architecture
Micro-operation cache: a power aware frontend for variable instruction length ISA
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
A low-complexity fetch architecture for high-performance superscalar processors
ACM Transactions on Architecture and Code Optimization (TACO)
IEEE Transactions on Computers
Energy-efficient and high-performance instruction fetch using a block-aware ISA
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
The instruction register file micro-architecture
Future Generation Computer Systems - Special issue: Parallel computing technologies
Block-aware instruction set architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Wide and efficient trace prediction using the local trace predictor
Proceedings of the 20th annual international conference on Supercomputing
Evaluating trace cache energy efficiency
ACM Transactions on Architecture and Code Optimization (TACO)
The instruction register file micro-architecture
Future Generation Computer Systems - Special issue: Parallel computing technologies
Do trace cache, value prediction and prefetching improve SMT throughput?
ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Improving instruction delivery with a block-aware ISA
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Hi-index | 0.01 |
The increasing widths of superscalar processors are placing greater demands upon the fetch mechanism. The trace cache meets these demands by placing logically contiguous instructions in physically contiguous storage. It is capable of supplying multiple fetch blocks each cycle. In this paper we examine two fetch and issue techniques, partial matching and inactive issue, that improve the overall performance of the trace cache by improving the effective fetch rate. We show that for the SPECint95 benchmarks partial matching increases the overall performance by 12% and inactive issue by 15%. Furthermore we apply these two techniques to issue blocks from trace segments which contain multiple execution paths. We conclude with a performance comparison between a trace cache implementing partial matching and inactive issue and an aggressive single block fetch mechanism. The trace cache increases performance by an average of 25% over the instruction cache.