Reduced instruction set computer architectures for VLSI
Reduced instruction set computer architectures for VLSI
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The hardware architecture of the CRISP microprocessor
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Measurement and analysis of instruction use in the VAX-11/780
ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Hardware/software tradeoffs for increased performance
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
A case study of VAX-11 instruction set usage for compiler execution
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
RISC I: A Reduced Instruction Set VLSI Computer
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
The PDP-11: A case study of how not to design condition codes
ISCA '78 Proceedings of the 5th annual symposium on Computer architecture
The reduction of branch instruction execution overhead using structured control flow
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Analysis and performance of computer instruction sets.
Analysis and performance of computer instruction sets.
Planning a computer system: Project Stretch
Planning a computer system: Project Stretch
WISQ: a restartable architecture using queues
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
The hardware architecture of the CRISP microprocessor
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Design tradeoffs to support the C programming language in the CRISP microprocessor
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Comparing software and hardware schemes for reducing the cost of branches
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Forward semantic: a compiler-assisted instruction fetch method for heavily pipelined processors
MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
The evolution of RISC technology at IBM
IBM Journal of Research and Development
Reducing the branch penalty by rearranging instructions in a double-width memory
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Improving instruction cache behavior by reducing cache pollution
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Branch history table prediction of moving target branches due to subroutine returns
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Two-level adaptive training branch prediction
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Branch Strategies: Modeling and Optimization (Pipeline Processing)
IEEE Transactions on Computers
Effects of building blocks on the performance of super-scalar architecture
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Performance evaluation of a decoded instruction cache for variable instruction-length computers
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Y-Pipe: a conditional branching scheme without pipeline delays
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A comprehensive instruction fetch mechanism for a processor supporting speculative execution
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Toward zero-cost branches using instruction registers
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Fast and accurate instruction fetch and branch prediction
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Resource allocation in a high clock rate microprocessor
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing branch costs via branch alignment
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Instruction fetch mechanisms for VLIW architectures with compressed encodings
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Hardware implementation of a general multi-way jump mechanism
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
MIDEE: smoothing branch and instruction cache miss penalties on deep pipelines
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Alternative implementations of two-level adaptive branch prediction
25 years of the international symposia on Computer architecture (selected papers)
Using value prediction to increase the power of speculative execution hardware
ACM Transactions on Computer Systems (TOCS)
Control flow optimization for supercomputer scalar processing
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Instruction fetch unit for parallel execution of branch instructions
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Low-cost branch folding for embedded applications with small tight loops
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Performance comparison of load/store and symmetric instruction set architectures
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Reducing the cost of branches by using registers
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Proceedings of the 38th annual Design Automation Conference
The Misprediction Recovery Cache
International Journal of Parallel Programming
A Comparison of RISC Architectures
IEEE Micro
The Gmicro/100 32-Bit Microprocessor
IEEE Micro
Efficient Instruction Sequencing with Inline Target Insertion
IEEE Transactions on Computers
Branch Target Buffer Design and Optimization
IEEE Transactions on Computers
Performance Evaluation of a Decoded Instruction Cache for Variable Instruction Length Computers
IEEE Transactions on Computers
A reprogrammable customization framework for efficient branch resolution in embedded processors
ACM Transactions on Embedded Computing Systems (TECS)
The evolution of RISC technology at IBM
IBM Journal of Research and Development
Hi-index | 0.01 |
A new method of implementing branch instructions is presented. This technique has been implemented in the CRISP Microprocessor. With a combination of hardware and software techniques the execution time cost for many branches can be effectively reduced to zero. Branches are folded into other instructions, making their execution as separate instructions unnecessary. Branch Folding can reduce the apparent number of instructions needed to execute a program by the number of branches in that program, as well as reducing or eliminating pipeline breakage. Statistics are presented demonstrating the effectiveness of Branch Folding and associated techniques used in the CRISP Microprocessor.