Cache Operations by MRU Change
IEEE Transactions on Computers
Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Single instruction stream parallelism is greater than two
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The expandable split window paradigm for exploiting fine-grain parallelsim
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Dynamic dependency analysis of ordinary programs
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An efficient resource-constrained global scheduling technique for superscalar and VLIW processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
A fill-unit approach to multiple instruction issue
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Performance benefits of large execution atomic units in dynamically scheduled machines
ICS '89 Proceedings of the 3rd international conference on Supercomputing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ACM Computing Surveys (CSUR)
Improving trace cache effectiveness with branch promotion and trace packing
Proceedings of the 25th annual international symposium on Computer architecture
Analyzing the working set characteristics of branch execution
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Evaluation of Design Options for the Trace Cache Fetch Mechanism
IEEE Transactions on Computers - Special issue on cache memory and related problems
A scalable front-end architecture for fast instruction delivery
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Fetch directed instruction prefetching
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Architectural and compiler support for effective instruction prefetching: a cooperative approach
ACM Transactions on Computer Systems (TOCS)
Optimizations Enabled by a Decoupled Front-End Architecture
IEEE Transactions on Computers
Focusing processor policies via critical-path prediction
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Skipper: a microarchitecture for exploiting control-flow independence
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Instruction fetch deferral using static slack
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 30th annual international symposium on Computer architecture
Scalable selective re-execution for EDGE architectures
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Energy-efficient and high-performance instruction fetch using a block-aware ISA
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Dynamically configurable shared CMP helper engines for improved performance
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Block-aware instruction set architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Improving the performance and power efficiency of shared helpers in CMPs
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Ginger: control independence using tag rewriting
Proceedings of the 34th annual international symposium on Computer architecture
Improving instruction delivery with a block-aware ISA
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
In conventional processors, each instruction cache fetch brings in a group of instructions. Upon encountering an instruction cache miss, the processor will wait until the instruction cache miss is serviced before continuing to fetch any new instructions. This paper presents a new technique, called out-of-order issue, which allows the processor to temporarily ignore the instructions associated with the instruction cache miss. The processor attempts to fetch the instructions that follow the group of instructions associated with the miss. These instructions are then decoded and written into the processor's reservation stations. Later, after the instruction cache miss has been serviced, the instructions associated with the miss are decoded and written into the reservation stations. (We use the term issue to indicate the act of writing instructions into the reservation stations. With this technique, instructions are not written into the reservation stations in program order. Hence, the term out-of-order issue.) In this paper, we introduce the concept of out-of-order issue, describe its implementation, and present some initial data showing the performance gains possible with out-of-order issue.