PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A fill-unit approach to multiple instruction issue
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Reducing branch costs via branch alignment
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Optimization of instruction fetch mechanisms for high issue rates
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Path-based next trace prediction
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
ICS '99 Proceedings of the 13th international conference on Supercomputing
Increasing the size of atomic instruction blocks using control flow assertions
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Skipper: a microarchitecture for exploiting control-flow independence
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Out-of-Order Instruction Fetch Using Multiple Sequencers
ICPP '02 Proceedings of the 2002 International Conference on Parallel Processing
Design and evaluation of a multiscalar processor
Design and evaluation of a multiscalar processor
LLVA: A Low-level Virtual Instruction Set Architecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Reducing Rename Logic Complexity for High-Speed and Low-Power Front-End Architectures
IEEE Transactions on Computers
Block-aware instruction set architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Wide and efficient trace prediction using the local trace predictor
Proceedings of the 20th annual international conference on Supercomputing
Evaluating trace cache energy efficiency
ACM Transactions on Architecture and Code Optimization (TACO)
Ginger: control independence using tag rewriting
Proceedings of the 34th annual international symposium on Computer architecture
Hi-index | 0.00 |
As processor back-ends get more aggressive, front-ends will have to scale as well. Although the back-ends of superscalar processors have continued to become more parallel, the front-ends remain sequential. This paper describes techniques for fetching and renaming multiple non-contiguous portions of the dynamic instruction stream in parallel using multiple fetch and rename units. It demonstrates that parallel front-ends are a viable alternative to high-performance sequential front-ends.Compared with an equivalently-sized trace cache, our technique increases cache bandwidth utilization by 17%, front-end throughput by 20%, and performance by 5%. Parallelism also enhances latency tolerance: a parallel front-end loses only 6% performance as the cache size is decreased from 128 KB to 8 KB, compared with a 50--65% performance loss for sequential fetch mechanisms.