Computer
An Instruction Issuing Approach to Enhancing Performance in Multiple Functional Unit Processors
IEEE Transactions on Computers
Highly concurrent scalar processing
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
HPS, a new microarchitecture: rationale and introduction
MICRO 18 Proceedings of the 18th annual workshop on Microprogramming
Branch folding in the CRISP microprocessor: reducing branch delay to zero
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
WISQ: a restartable architecture using queues
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Architectural tradeoffs in the design of MIPS-X
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Checkpoint repair for high-performance out-of-order execution machines
IEEE Transactions on Computers
The performance potential of multiple functional unit processors
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Trace selection for compiling large C application programs to microcode
MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Multiple instruction issue and single-chip processors
MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Tradeoffs in instruction format design for horizontal architectures
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits on multiple instruction issue
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Inline function expansion for compiling C programs
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Comparing software and hardware schemes for reducing the cost of branches
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Forward semantic: a compiler-assisted instruction fetch method for heavily pipelined processors
MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
Control flow optimization for supercomputer scalar processing
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Multiple instruction issue in the NonStop cyclone processor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Implementation of precise interrupts in pipelined processors
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
The Design of the 88000 RISC Family
IEEE Micro
MICRO 15 Proceedings of the 15th annual workshop on Microprogramming
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
A Characterization of Processor Performance in the vax-11/780
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Hpsm: exploiting concurrency to achieve high performance in a single-chip microarchitecture
Hpsm: exploiting concurrency to achieve high performance in a single-chip microarchitecture
Prophetic branches: a branch architecture for code compaction and efficient execution
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
A Practical Methodology for the Formal Verification of RISC Processors
Formal Methods in System Design
Proceedings of the 27th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Hi-index | 14.98 |
Inline target insertion, a specific compiler and pipeline implementation method for delayed branches with squashing, is defined. The method is shown to offer two important features not discovered in previous studies. First, branches inserted into branch slots are correctly executed. Second, the execution returns correctly from interrupts or exceptions with only one program counter. These two features result in better performance and less software/hardware complexity than conventional delayed branching mechanisms.