Code generation and reorganization in the presence of pipeline constraints
POPL '82 Proceedings of the 9th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
MIPS: A microprocessor architecture
MICRO 15 Proceedings of the 15th annual workshop on Microprogramming
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
RISC I: A Reduced Instruction Set VLSI Computer
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
2n-way jump microinstruction hardware and an effective instruction binding method
MICRO 13 Proceedings of the 13th annual workshop on Microprogramming
Analysis and performance of computer instruction sets.
Analysis and performance of computer instruction sets.
Highly concurrent scalar processing
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
An evaluation of branch architectures
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Measurement and evaluation of the MIPS architecture and processor
ACM Transactions on Computer Systems (TOCS)
ACM SIGARCH Computer Architecture News - Special Issue: Architectural Support for Operating Systems
Microcode compaction with timing constraints
ACM SIGMICRO Newsletter
Forward semantic: a compiler-assisted instruction fetch method for heavily pipelined processors
MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
Reducing the branch penalty by rearranging instructions in a double-width memory
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Integrating register allocation and instruction scheduling for RISCs
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The Marion system for retargetable instruction scheduling
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
The effect on RISC performance of register set size and structure versus code generation strategy
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Two-level adaptive training branch prediction
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Branch Strategies: Modeling and Optimization (Pipeline Processing)
IEEE Transactions on Computers
Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
A comprehensive instruction fetch mechanism for a processor supporting speculative execution
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Evaluating Performance Tradeoffs Between Fine-Grained and Coarse-Grained Alternatives
IEEE Transactions on Parallel and Distributed Systems
An instruction reoderer for pipelined computers
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
MIDEE: smoothing branch and instruction cache miss penalties on deep pipelines
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Microcode compaction with timing constraints
MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
Very long instruction work architectures and the ELI-512
25 years of the international symposia on Computer architecture (selected papers)
Alternative implementations of two-level adaptive branch prediction
25 years of the international symposia on Computer architecture (selected papers)
Instruction fetch unit for parallel execution of branch instructions
ICS '89 Proceedings of the 3rd international conference on Supercomputing
PIPE: a VLSI decoupled architecture
ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Efficient Instruction Sequencing with Inline Target Insertion
IEEE Transactions on Computers
Reducing Branch Delay to Zero in Pipelined Processors
IEEE Transactions on Computers
Branch Target Buffer Design and Optimization
IEEE Transactions on Computers
MIPS: A microprocessor architecture
MICRO 15 Proceedings of the 15th annual workshop on Microprogramming
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Applications of pipelining to firmware
MICRO 17 Proceedings of the 17th annual workshop on Microprogramming
An improvement of trace scheduling for global microcode compaction
MICRO 17 Proceedings of the 17th annual workshop on Microprogramming
Global methods in the flow graph approach to retargetable microcode generation
MICRO 17 Proceedings of the 17th annual workshop on Microprogramming
A backtracking instruction scheduler using predicate-based code hoisting to fill delay slots
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Hi-index | 0.01 |
Delayed branches are commonly found in micro-architectures. A compiler or assembler can exploit delayed branches. This is achieved by moving code from one of several points to the positions following the branch instruction. We present several strategies for moving code to utilize the branch delay, and discuss the requirements and benefits of these strategies. An algorithm for processing branch delays has been implemented and we give empirical results. The performance data show that a reasonable percentage of these delays can be avoided.