Checkpoint repair for high-performance out-of-order execution machines
IEEE Transactions on Computers
Hardware support for large atomic units in dynamically scheduled machines
MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
IEEE Transactions on Computers
MIPS RISC architectures
COMPCON '92 Proceedings of the thirty-seventh international conference on COMPCON
Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Interlock collapsing ALU for increased instruction-level parallelism
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
IBM Power and PowerPC
IEEE Micro
A Development Environment for Horizontal Microcode
IEEE Transactions on Software Engineering
Dynamic rescheduling: a technique for object code compatibility in VLIW architectures
Proceedings of the 28th annual international symposium on Microarchitecture
Improving CISC instruction decoding performance using a fill unit
Proceedings of the 28th annual international symposium on Microarchitecture
Importance of profiling and compatibility
ACM Computing Surveys (CSUR) - Special issue: position statements on strategic directions in computing research
A persistent rescheduled-page cache for low overhead object code compatibility in VLIW architectures
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences
Proceedings of the 24th annual international symposium on Computer architecture
Exploiting instruction level parallelism in processors by caching scheduled groups
Proceedings of the 24th annual international symposium on Computer architecture
DAISY: dynamic compilation for 100% architectural compatibility
Proceedings of the 24th annual international symposium on Computer architecture
Alternative fetch and issue policies for the trace cache fetch mechanism
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
On high-bandwidth data cache design for multi-issue processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving trace cache effectiveness with branch promotion and trace packing
Proceedings of the 25th annual international symposium on Computer architecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A Trace Cache Microarchitecture and Evaluation
IEEE Transactions on Computers - Special issue on cache memory and related problems
MPS: Miss-Path Scheduling for Multiple-Issue Processors
IEEE Transactions on Computers
Proceedings of the 27th annual international symposium on Computer architecture
Allowing for ILP in an embedded Java processor
Proceedings of the 27th annual international symposium on Computer architecture
IEEE Transactions on Computers
The Misprediction Recovery Cache
International Journal of Parallel Programming
Proceedings of the 30th annual international symposium on Computer architecture
Exploiting compiler-generated schedules for energy savings in high-performance processors
Proceedings of the 2003 international symposium on Low power electronics and design
Low-power, low-complexity instruction issue using compiler assistance
Proceedings of the 19th annual international conference on Supercomputing
Trace-Based runtime instruction rescheduling for architecture extension
ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
Hi-index | 0.01 |
Multiple issue of instructions occurs in superscalar and VLIW machines. This paper investigates a third type of machine design, which combines the advantages of code compatibility as in superscalars and the absence of complex dependency-checking logic from the decoder as in VLIW. In this design, a stream of scalar instructions is executed by the hardware and is simultaneously compacted into VLIW-type instructions, which are then stored in a structure called a shadow cache. When a shadow cache line contains the instructions requested by the fetch unit, the scalar instruction stream is preempted and all operations in the shadow cache line are simultaneously issued and executed. The mechanism that compacts instructions is called a fill unit, and was first proposed for dynamically compacting microoperations into large executable units by Melvin, Shebanow, and Patt in 1988. We have extended their approach to directly handle data dependencies, delayed branches, and speculative execution (using branch prediction). This approach is evaluated using the MIPS architecture, and a six-functional-unit machine is found to be 52 to 108% faster than a single-issue processor for unrecompiled SPECint92 benchmarks.