ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Branch folding in the CRISP microprocessor: reducing branch delay to zero
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
An evaluation of branch architectures
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Trace selection for compiling large C application programs to microcode
MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
RISC I: A Reduced Instruction Set VLSI Computer
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
A Characterization of Processor Performance in the vax-11/780
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Inline function expansion for compiling C programs
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Forward semantic: a compiler-assisted instruction fetch method for heavily pipelined processors
MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
Reducing the branch penalty by rearranging instructions in a double-width memory
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
An empirical study of the CRAY Y-MP processor using the Perfect club benchmarks
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Two-level adaptive training branch prediction
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Performance optimization of pipelined primary cache
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Improving the accuracy of dynamic branch prediction using branch correlation
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Branch merging for effective exploitation of instruction-level parallelism
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A comprehensive instruction fetch mechanism for a processor supporting speculative execution
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Branch with masked squashing in superpipelined processors
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Characterizing the impact of predicated execution on branch prediction
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The influence of branch prediction table interference on branch prediction scheme performance
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Compiler synthesized dynamic branch prediction
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Prophetic branches: a branch architecture for code compaction and efficient execution
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Superblock formation using static program analysis
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Implementation and analysis of path history in dynamic branch prediction schemes
ICS '97 Proceedings of the 11th international conference on Supercomputing
Multilevel Optimization of Pipelined Caches
IEEE Transactions on Computers
Alternative implementations of two-level adaptive branch prediction
25 years of the international symposia on Computer architecture (selected papers)
Proceedings of the 1999 ACM symposium on Applied computing
Control flow optimization for supercomputer scalar processing
ICS '89 Proceedings of the 3rd international conference on Supercomputing
A brief survey of papers on scheduling for pipelined processors
ACM SIGPLAN Notices
Efficient Instruction Sequencing with Inline Target Insertion
IEEE Transactions on Computers
Three Architectural Models for Compiler-Controlled Speculative Execution
IEEE Transactions on Computers
The Performance of Counter- and Correlation-Based Schemes for Branch Target Buffers
IEEE Transactions on Computers
Static next sub-bank prediction for drowsy instruction cache
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler-guided next sub-bank prediction for reducing instruction cache leakage energy
Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Hi-index | 0.03 |
Pipelining has become a common technique to increase throughput of the instruction fetch, instruction decode, and instruction execution portions of modern computers. Branch instructions disrupt the flow of instructions through the pipeline, increasing the overall execution cost of branch instructions. Three schemes to reduce the cost of branches are presented in the context of a general pipeline model. Ten realistic Unix domain programs are used to directly compare the cost and performance of the three schemes and the results are in favor of the software-based scheme. For example, the software-based scheme has a cost of 1.65 cycles/branch vs. a cost of 1.68 cycles/branch of the best hardware scheme for a highly pipelined processor (11-stage pipeline). The results are 1.19 (software scheme) vs. 1.23 cycles/branch (best hardware scheme) for a moderately pipelined processor (5-stage pipeline).