ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
An evaluation of branch architectures
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Architectural tradeoffs in the design of MIPS-X
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Comparing software and hardware schemes for reducing the cost of branches
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Branch Strategies: Modeling and Optimization (Pipeline Processing)
IEEE Transactions on Computers
Predicting conditional branch directions from previous runs of a program
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Prophetic branches: a branch architecture for code compaction and efficient execution
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Fast Prolog with an extended general purpose architecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
A Prolog Benchmark Suite for Aquarius
A Prolog Benchmark Suite for Aquarius
Minimizing branch misprediction penalties for superpipelined processors
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Cache design trade-offs for power and performance optimization: a case study
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Hi-index | 0.00 |
The performance of a superpipeline processor heavily relies on its branch performance. Traditional branch strategies used in pipelined processors are delayed branches and branch with squashing. Delayed branches use safe instructions to fill delay slots. However, for a deeply pipelined processor, a compiler may not be able to find sufficient safe instructions to fill the branch delay slots. Branch with squashing takes advantage of using instructions in target basic blocks to fill the branch delay slots. However, the penalty of branch misprediction is large in superpipelined processors.In this paper, we proposed a novel branch scheme, named branch with masked squashing, which takes advantage of both delayed branch and branch with squashing. The basic idea is to fill delay slots with safe instructions which may come from above or after the branch. For the remaining unfilled delay slots, we fill with instructions from the predicted target path. In the case of misprediction, only unsafe instructions are annulled. The safe instructions in branch delay slots are always executed. Simulation results show that this branch strategy performs better than traditional delayed branch and branch with squashing.