A fill-unit approach to multiple instruction issue

Authors:
Manoj Franklin;Mark Smotherman
Affiliations:
Dept. of Elect. and Computer Eng., Clemson University, Clemson, SC;Dept. of Computer Science, Clemson University, Clemson, SC
Venue:
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Year:
1994

Citing 12
Cited 23

Checkpoint repair for high-performance out-of-order execution machines

IEEE Transactions on Computers
Hardware support for large atomic units in dynamically scheduled machines

MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers

IEEE Transactions on Computers
MIPS RISC architectures

MIPS RISC architectures
The SuperSPARC microprocessor

COMPCON '92 Proceedings of the thirty-seventh international conference on COMPCON
Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Interlock collapsing ALU for increased instruction-level parallelism

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
IBM Power and PowerPC

IBM Power and PowerPC
The Metaflow Architecture

IEEE Micro
Organization of the Motorola 88110 Superscalar RISC Microprocessor

IEEE Micro
A Development Environment for Horizontal Microcode

IEEE Transactions on Software Engineering

Dynamic rescheduling: a technique for object code compatibility in VLIW architectures

Proceedings of the 28th annual international symposium on Microarchitecture
Improving CISC instruction decoding performance using a fill unit

Proceedings of the 28th annual international symposium on Microarchitecture
Importance of profiling and compatibility

ACM Computing Surveys (CSUR) - Special issue: position statements on strategic directions in computing research
A persistent rescheduled-page cache for low overhead object code compatibility in VLIW architectures

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences

Proceedings of the 24th annual international symposium on Computer architecture
Exploiting instruction level parallelism in processors by caching scheduled groups

Proceedings of the 24th annual international symposium on Computer architecture
DAISY: dynamic compilation for 100% architectural compatibility

Proceedings of the 24th annual international symposium on Computer architecture
Alternative fetch and issue policies for the trace cache fetch mechanism

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
On high-bandwidth data cache design for multi-issue processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving trace cache effectiveness with branch promotion and trace packing

Proceedings of the 25th annual international symposium on Computer architecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A Trace Cache Microarchitecture and Evaluation

IEEE Transactions on Computers - Special issue on cache memory and related problems
MPS: Miss-Path Scheduling for Multiple-Issue Processors

IEEE Transactions on Computers
Instruction path coprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Allowing for ILP in an embedded Java processor

Proceedings of the 27th annual international symposium on Computer architecture
Properties of Rescheduling Size Invariance for Dynamic Rescheduling-Based VLIW Cross-Generation Compatibility

IEEE Transactions on Computers
The Misprediction Recovery Cache

International Journal of Parallel Programming
Parallelism in the front-end

Proceedings of the 30th annual international symposium on Computer architecture
Exploiting compiler-generated schedules for energy savings in high-performance processors

Proceedings of the 2003 international symposium on Low power electronics and design
Low-power, low-complexity instruction issue using compiler assistance

Proceedings of the 19th annual international conference on Supercomputing
Trace-Based runtime instruction rescheduling for architecture extension

ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Multiple issue of instructions occurs in superscalar and VLIW machines. This paper investigates a third type of machine design, which combines the advantages of code compatibility as in superscalars and the absence of complex dependency-checking logic from the decoder as in VLIW. In this design, a stream of scalar instructions is executed by the hardware and is simultaneously compacted into VLIW-type instructions, which are then stored in a structure called a shadow cache. When a shadow cache line contains the instructions requested by the fetch unit, the scalar instruction stream is preempted and all operations in the shadow cache line are simultaneously issued and executed. The mechanism that compacts instructions is called a fill unit, and was first proposed for dynamically compacting microoperations into large executable units by Melvin, Shebanow, and Patt in 1988. We have extended their approach to directly handle data dependencies, delayed branches, and speculative execution (using branch prediction). This approach is evaluated using the MIPS architecture, and a six-functional-unit machine is found to be 52 to 108% faster than a single-issue processor for unrecompiled SPECint92 benchmarks.