Branch folding in the CRISP microprocessor: reducing branch delay to zero

Authors:
D. R. Ditzel;H. R. McLellan
Affiliations:
AT&T Bell Laboratories, Murray Hill, NJ;AT&T Bell Laboratories, Murray Hill, NJ
Venue:
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Year:
1987

Citing 13
Cited 41

Reduced instruction set computer architectures for VLSI

Reduced instruction set computer architectures for VLSI
Reducing the cost of branches

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The hardware architecture of the CRISP microprocessor

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Measurement and analysis of instruction use in the VAX-11/780

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Hardware/software tradeoffs for increased performance

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
The 801 minicomputer

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
A case study of VAX-11 instruction set usage for compiler execution

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
RISC I: A Reduced Instruction Set VLSI Computer

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
The PDP-11: A case study of how not to design condition codes

ISCA '78 Proceedings of the 5th annual symposium on Computer architecture
The reduction of branch instruction execution overhead using structured control flow

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Analysis and performance of computer instruction sets.

Analysis and performance of computer instruction sets.
Planning a computer system: Project Stretch

Planning a computer system: Project Stretch

WISQ: a restartable architecture using queues

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
The hardware architecture of the CRISP microprocessor

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Design tradeoffs to support the C programming language in the CRISP microprocessor

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Reducing the Branch Penalty in Pipelined Processors

Computer
Comparing software and hardware schemes for reducing the cost of branches

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Forward semantic: a compiler-assisted instruction fetch method for heavily pipelined processors

MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
The evolution of RISC technology at IBM

IBM Journal of Research and Development
Reducing the branch penalty by rearranging instructions in a double-width memory

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Improving instruction cache behavior by reducing cache pollution

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Branch history table prediction of moving target branches due to subroutine returns

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Two-level adaptive training branch prediction

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Branch Strategies: Modeling and Optimization (Pipeline Processing)

IEEE Transactions on Computers
Effects of building blocks on the performance of super-scalar architecture

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Performance evaluation of a decoded instruction cache for variable instruction-length computers

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Y-Pipe: a conditional branching scheme without pipeline delays

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A comprehensive instruction fetch mechanism for a processor supporting speculative execution

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Toward zero-cost branches using instruction registers

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Fast and accurate instruction fetch and branch prediction

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Resource allocation in a high clock rate microprocessor

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
A technique for monitoring run-time dynamics of an operating system and a microprocessor executing user applications

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing branch costs via branch alignment

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Instruction fetch mechanisms for VLIW architectures with compressed encodings

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Hardware implementation of a general multi-way jump mechanism

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
MIDEE: smoothing branch and instruction cache miss penalties on deep pipelines

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Alternative implementations of two-level adaptive branch prediction

25 years of the international symposia on Computer architecture (selected papers)
Using value prediction to increase the power of speculative execution hardware

ACM Transactions on Computer Systems (TOCS)
Control flow optimization for supercomputer scalar processing

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Instruction fetch unit for parallel execution of branch instructions

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Low-cost branch folding for embedded applications with small tight loops

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Performance comparison of load/store and symmetric instruction set architectures

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Reducing the cost of branches by using registers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Speeding up control-dominated applications through microarchitectural customizations in embedded processors

Proceedings of the 38th annual Design Automation Conference
The Misprediction Recovery Cache

International Journal of Parallel Programming
A Comparison of RISC Architectures

IEEE Micro
The Gmicro/100 32-Bit Microprocessor

IEEE Micro
Efficient Instruction Sequencing with Inline Target Insertion

IEEE Transactions on Computers
Branch Target Buffer Design and Optimization

IEEE Transactions on Computers
Performance Evaluation of a Decoded Instruction Cache for Variable Instruction Length Computers

IEEE Transactions on Computers
A reprogrammable customization framework for efficient branch resolution in embedded processors

ACM Transactions on Embedded Computing Systems (TECS)
The evolution of RISC technology at IBM

IBM Journal of Research and Development

Quantified Score

Hi-index	0.01

Visualization

Abstract

A new method of implementing branch instructions is presented. This technique has been implemented in the CRISP Microprocessor. With a combination of hardware and software techniques the execution time cost for many branches can be effectively reduced to zero. Branches are folded into other instructions, making their execution as separate instructions unnecessary. Branch Folding can reduce the apparent number of instructions needed to execute a program by the number of branches in that program, as well as reducing or eliminating pipeline breakage. Statistics are presented demonstrating the effectiveness of Branch Folding and associated techniques used in the CRISP Microprocessor.