Reduced instruction set computers
Communications of the ACM - Special section on computer architecture
Postpass Code Optimization of Pipeline Constraints
ACM Transactions on Programming Languages and Systems (TOPLAS)
Hardware/software tradeoffs for increased performance
ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
A Characterization of Processor Performance in the vax-11/780
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
A portable machine-independent global optimizer--design and measurements
A portable machine-independent global optimizer--design and measurements
Code optimization of pipeline constraints
Code optimization of pipeline constraints
Reduced instruction set computer architectures for vlsi (microprocessor, risc, multiple-windows - of - registers)
Branch folding in the CRISP microprocessor: reducing branch delay to zero
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
An evaluation of branch architectures
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Checkpoint repair for out-of-order execution machines
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
WISQ: a restartable architecture using queues
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Architectural tradeoffs in the design of MIPS-X
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Checkpoint repair for high-performance out-of-order execution machines
IEEE Transactions on Computers
Measurement and evaluation of the MIPS architecture and processor
ACM Transactions on Computer Systems (TOCS)
A novel effective address calculation mechanism for RISC microprocessors
ACM SIGARCH Computer Architecture News - Special Issue: Architectural Support for Operating Systems
Operation scheduling in reconfigurable, multifunction pipelines
ACM SIGMICRO Newsletter
Limits on multiple instruction issue
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Comparing software and hardware schemes for reducing the cost of branches
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Architectural and organizational tradeoffs in the design of the MultiTitan CPU
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
An architecture framework for application-specific and scalable architectures
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Forward semantic: a compiler-assisted instruction fetch method for heavily pipelined processors
MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
A flexible VLSI core for an adaptable architecture
MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
IEEE Transactions on Computers
Reducing the branch penalty by rearranging instructions in a double-width memory
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Predicting program behavior using real or estimated profiles
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
OHMEGA: a VLSI superscalar processor architecture for numerical applications
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An empirical study of the CRAY Y-MP processor using the Perfect club benchmarks
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
IMPACT: an architectural framework for multiple-instruction-issue processors
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
A parallel pipelined processor with conditional instruction execution
ACM SIGARCH Computer Architecture News - Symposium on parallel algorithms and architectures
The effect of employing advanced branching mechanisms in superscalar processors
ACM SIGARCH Computer Architecture News
Exploiting multi-way branching to boost superscalar processor performance
ACM SIGPLAN Notices
ACM SIGARCH Computer Architecture News
Two-level adaptive training branch prediction
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Branch Strategies: Modeling and Optimization (Pipeline Processing)
IEEE Transactions on Computers
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Performance evaluation of a decoded instruction cache for variable instruction-length computers
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Performance optimization of pipelined primary cache
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Instruction-level parallelism from execution interlock collapsing
ACM SIGARCH Computer Architecture News
Improving the accuracy of dynamic branch prediction using branch correlation
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Predicting conditional branch directions from previous runs of a program
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Improving instruction supply efficiency in superscalar architectures using instruction trace buffers
SAC '92 Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing: technological challenges of the 1990's
Branch merging for effective exploitation of instruction-level parallelism
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Y-Pipe: a conditional branching scheme without pipeline delays
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A comprehensive instruction fetch mechanism for a processor supporting speculative execution
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
An out-of-order superscalar processor with speculative execution and fast, precise interrupts
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Evaluation of A+B=K Conditions Without Carry Propagation
IEEE Transactions on Computers
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Reducing indirect function call overhead in C++ programs
POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Improving semi-static branch prediction by code replication
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Fast and accurate instruction fetch and branch prediction
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The impact of unresolved branches on branch prediction scheme performance
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Branch with masked squashing in superpipelined processors
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Branch classification: a new mechanism for improving branch predictor performance
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The effect of speculatively updating branch history on branch prediction accuracy, revisited
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving the accuracy of static branch prediction using branch correlation
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing branch costs via branch alignment
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Accurate static branch prediction by value range propagation
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Corpus-based static branch prediction
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
A comparative analysis of schemes for correlated branch prediction
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Next cache line and set prediction
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Instruction cache fetch policies for speculative execution
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Direct-mapped versus set-associative pipelined caches
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
The influence of branch prediction table interference on branch prediction scheme performance
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Performance issues in correlated branch prediction schemes
Proceedings of the 28th annual international symposium on Microarchitecture
Partial resolution in branch target buffers
Proceedings of the 28th annual international symposium on Microarchitecture
A system level perspective on branch architecture performance
Proceedings of the 28th annual international symposium on Microarchitecture
Alternative implementations of hybrid branch predictors
Proceedings of the 28th annual international symposium on Microarchitecture
An analysis of dynamic branch prediction schemes on system workloads
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Correlation and aliasing in dynamic branch predictors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Performance comparison of ILP machines with cycle time evaluation
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Evidence-based static branch prediction using machine learning
ACM Transactions on Programming Languages and Systems (TOPLAS)
Accurate and practical profile-driven compilation using the profile buffer
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Hardware implementation of a general multi-way jump mechanism
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
An instruction reoderer for pipelined computers
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Prophetic branches: a branch architecture for code compaction and efficient execution
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
MIDEE: smoothing branch and instruction cache miss penalties on deep pipelines
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Near-optimal intraprocedural branch alignment
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Target prediction for indirect jumps
Proceedings of the 24th annual international symposium on Computer architecture
The agree predictor: a mechanism for reducing negative branch history interference
Proceedings of the 24th annual international symposium on Computer architecture
Partial Resolution in Branch Target Buffers
IEEE Transactions on Computers
IMPACT: an architectural framework for multiple-instruction-issue processors
25 years of the international symposia on Computer architecture (selected papers)
Alternative implementations of two-level adaptive branch prediction
25 years of the international symposia on Computer architecture (selected papers)
Compact and efficient presentation conversion code
IEEE/ACM Transactions on Networking (TON)
Using value prediction to increase the power of speculative execution hardware
ACM Transactions on Computer Systems (TOCS)
A Practical Methodology for the Formal Verification of RISC Processors
Formal Methods in System Design
Walk-Time Address Adjustment for Improving the Accuracy of Dynamic Branch Prediction
IEEE Transactions on Computers
Control flow optimization for supercomputer scalar processing
ICS '89 Proceedings of the 3rd international conference on Supercomputing
Instruction fetch unit for parallel execution of branch instructions
ICS '89 Proceedings of the 3rd international conference on Supercomputing
LISP on a reduced-instruction-set-processor
LFP '86 Proceedings of the 1986 ACM conference on LISP and functional programming
Performance comparison of load/store and symmetric instruction set architectures
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Reducing the cost of branches by using registers
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Boosting beyond static scheduling in a superscalar processor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Software profiling for hot path prediction: less is more
ACM SIGPLAN Notices
Software profiling for hot path prediction: less is more
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
The Gmicro/100 32-Bit Microprocessor
IEEE Micro
Efficient Instruction Sequencing with Inline Target Insertion
IEEE Transactions on Computers
Reducing Branch Delay to Zero in Pipelined Processors
IEEE Transactions on Computers
Branch Target Buffer Design and Optimization
IEEE Transactions on Computers
Performance Evaluation of a Decoded Instruction Cache for Variable Instruction Length Computers
IEEE Transactions on Computers
Optimal 2-Bit Branch Predictors
IEEE Transactions on Computers
The Performance of Counter- and Correlation-Based Schemes for Branch Target Buffers
IEEE Transactions on Computers
Branch Prediction Using Profile Data
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Predicting program behavior using real or estimated profiles
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Improving WCET by applying a WC code-positioning optimization
ACM Transactions on Architecture and Code Optimization (TACO)
Post Register Allocation Spill Code Optimization
Proceedings of the International Symposium on Code Generation and Optimization
Reducing the cost of conditional transfers of control by using comparison specifications
Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
High-level power analysis for multi-core chips
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
TRICK: tracking and reusing compiler's knowledge
ACM SIGPLAN Notices
Hi-index | 0.04 |
Pipelining is the major organizational technique that computers use to reach higher single-processor performance. A fundamental disadvantage of pipelining is the loss incurred due to branches that require stalling or flushing the pipeline. Both hardware solutions and architectural changes have been proposed to overcome these problems. This paper examines a range of schemes for reducing branch cost focusing on both static (compile-time) and dynamic (hardware-assisted) prediction of branches. These schemes are investigated from quantitative performance and implementation viewpoints.1