Instruction fetch unit for parallel execution of branch instructions

Authors:
Antonio González;José M. Llaberia
Affiliations:
Departamento de Arquitectura de Computadores, Facultad de Informática, Universidad Politénica de Cataluña, 08028 Barcelona, Spain;Departamento de Arquitectura de Computadores, Facultad de Informática, Universidad Politénica de Cataluña, 08028 Barcelona, Spain
Venue:
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Year:
1989

Citing 10
Cited 2

Reduced instruction set computer architectures for VLSI

Reduced instruction set computer architectures for VLSI
Design Decisions in SPUR

Computer
Reducing the cost of branches

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Branch folding in the CRISP microprocessor: reducing branch delay to zero

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
An evaluation of branch architectures

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
WISQ: a restartable architecture using queues

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Architectural tradeoffs in the design of MIPS-X

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Optimizing delayed branches

MICRO 15 Proceedings of the 15th annual workshop on Microprogramming
Hardware/software tradeoffs for increased performance

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
MIPS: a VLSI processor architecture

MIPS: a VLSI processor architecture

MIDEE: smoothing branch and instruction cache miss penalties on deep pipelines

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Reducing Branch Delay to Zero in Pipelined Processors

IEEE Transactions on Computers

Quantified Score

Hi-index	0.00

Visualization

Abstract

A mechanism to reduce the cost of branches in pipelined processors is presented. This technique is implemented by means of a non-conventional cache (branch target cache) and an early branch detection circuit. Branches are executed by the instruction fetch unit (IFU) in parallel with the other instructions. In this way, the execution time cost for many branches can be effectively reduced to zero. In order to obtain the IFU design parameters, the mechanism is evaluated by means of an analytical model. Simulation results show the effectiveness of this technique.