Comparing software and hardware schemes for reducing the cost of branches

Authors:
W. W. Hwu;T. M. Conte;P. P. Chang
Affiliations:
Coordinated Science Laboratory, 1101 W. Sprintfield Ave., University of Illinois, Urbana, IL;Coordinated Science Laboratory, 1101 W. Sprintfield Ave., University of Illinois, Urbana, IL;Coordinated Science Laboratory, 1101 W. Sprintfield Ave., University of Illinois, Urbana, IL
Venue:
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Year:
1989

Citing 8
Cited 28

Reducing the cost of branches

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Branch folding in the CRISP microprocessor: reducing branch delay to zero

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
An evaluation of branch architectures

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Reducing the Branch Penalty in Pipelined Processors

Computer
Trace selection for compiling large C application programs to microcode

MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
RISC I: A Reduced Instruction Set VLSI Computer

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
A Characterization of Processor Performance in the vax-11/780

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture

Inline function expansion for compiling C programs

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Forward semantic: a compiler-assisted instruction fetch method for heavily pipelined processors

MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
Reducing the branch penalty by rearranging instructions in a double-width memory

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An empirical study of the CRAY Y-MP processor using the Perfect club benchmarks

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Two-level adaptive training branch prediction

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Performance optimization of pipelined primary cache

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Improving the accuracy of dynamic branch prediction using branch correlation

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Branch merging for effective exploitation of instruction-level parallelism

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A comprehensive instruction fetch mechanism for a processor supporting speculative execution

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Branch with masked squashing in superpipelined processors

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Characterizing the impact of predicated execution on branch prediction

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The influence of branch prediction table interference on branch prediction scheme performance

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Compiler synthesized dynamic branch prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Prophetic branches: a branch architecture for code compaction and efficient execution

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Superblock formation using static program analysis

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Implementation and analysis of path history in dynamic branch prediction schemes

ICS '97 Proceedings of the 11th international conference on Supercomputing
Multilevel Optimization of Pipelined Caches

IEEE Transactions on Computers
Alternative implementations of two-level adaptive branch prediction

25 years of the international symposia on Computer architecture (selected papers)
A dynamic scheduling logic for exploiting multiple functional units in single chip multithreaded architectures

Proceedings of the 1999 ACM symposium on Applied computing
Control flow optimization for supercomputer scalar processing

ICS '89 Proceedings of the 3rd international conference on Supercomputing
A brief survey of papers on scheduling for pipelined processors

ACM SIGPLAN Notices
Efficient Instruction Sequencing with Inline Target Insertion

IEEE Transactions on Computers
Three Architectural Models for Compiler-Controlled Speculative Execution

IEEE Transactions on Computers
The Performance of Counter- and Correlation-Based Schemes for Branch Target Buffers

IEEE Transactions on Computers
Static next sub-bank prediction for drowsy instruction cache

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler-guided next sub-bank prediction for reducing instruction cache leakage energy

Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications

Quantified Score

Hi-index	0.03

Visualization

Abstract

Pipelining has become a common technique to increase throughput of the instruction fetch, instruction decode, and instruction execution portions of modern computers. Branch instructions disrupt the flow of instructions through the pipeline, increasing the overall execution cost of branch instructions. Three schemes to reduce the cost of branches are presented in the context of a general pipeline model. Ten realistic Unix domain programs are used to directly compare the cost and performance of the three schemes and the results are in favor of the software-based scheme. For example, the software-based scheme has a cost of 1.65 cycles/branch vs. a cost of 1.68 cycles/branch of the best hardware scheme for a highly pipelined processor (11-stage pipeline). The results are 1.19 (software scheme) vs. 1.23 cycles/branch (best hardware scheme) for a moderately pipelined processor (5-stage pipeline).