Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order

Authors:
Jared Stark;Paul Racunas;Yale N. Patt
Affiliations:
Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, Michigan;Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, Michigan;Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, Michigan
Venue:
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Year:
1997

Citing 18
Cited 19

Cache Operations by MRU Change

IEEE Transactions on Computers
Program optimization for instruction caches

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Available instruction-level parallelism for superscalar and superpipelined machines

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Single instruction stream parallelism is greater than two

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The expandable split window paradigm for exploiting fine-grain parallelsim

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Dynamic dependency analysis of ordinary programs

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An efficient resource-constrained global scheduling technique for superscalar and VLIW processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Column-associative caches: a technique for reducing the miss rate of direct-mapped caches

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
A fill-unit approach to multiple instruction issue

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The difference-bit cache

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Performance benefits of large execution atomic units in dynamically scheduled machines

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache Memories

ACM Computing Surveys (CSUR)

Improving trace cache effectiveness with branch promotion and trace packing

Proceedings of the 25th annual international symposium on Computer architecture
Analyzing the working set characteristics of branch execution

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Evaluation of Design Options for the Trace Cache Fetch Mechanism

IEEE Transactions on Computers - Special issue on cache memory and related problems
A scalable front-end architecture for fast instruction delivery

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Fetch directed instruction prefetching

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Architectural and compiler support for effective instruction prefetching: a cooperative approach

ACM Transactions on Computer Systems (TOCS)
Optimizations Enabled by a Decoupled Front-End Architecture

IEEE Transactions on Computers
Focusing processor policies via critical-path prediction

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Skipper: a microarchitecture for exploiting control-flow independence

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Instruction fetch deferral using static slack

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Parallelism in the front-end

Proceedings of the 30th annual international symposium on Computer architecture
Scalable selective re-execution for EDGE architectures

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Energy-efficient and high-performance instruction fetch using a block-aware ISA

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Dynamically configurable shared CMP helper engines for improved performance

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Block-aware instruction set architecture

ACM Transactions on Architecture and Code Optimization (TACO)
Improving the performance and power efficiency of shared helpers in CMPs

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Ginger: control independence using tag rewriting

Proceedings of the 34th annual international symposium on Computer architecture
Improving instruction delivery with a block-aware ISA

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In conventional processors, each instruction cache fetch brings in a group of instructions. Upon encountering an instruction cache miss, the processor will wait until the instruction cache miss is serviced before continuing to fetch any new instructions. This paper presents a new technique, called out-of-order issue, which allows the processor to temporarily ignore the instructions associated with the instruction cache miss. The processor attempts to fetch the instructions that follow the group of instructions associated with the miss. These instructions are then decoded and written into the processor's reservation stations. Later, after the instruction cache miss has been serviced, the instructions associated with the miss are decoded and written into the reservation stations. (We use the term issue to indicate the act of writing instructions into the reservation stations. With this technique, instructions are not written into the reservation stations in program order. Hence, the term out-of-order issue.) In this paper, we introduce the concept of out-of-order issue, describe its implementation, and present some initial data showing the performance gains possible with out-of-order issue.