ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Branch folding in the CRISP microprocessor: reducing branch delay to zero
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Compile-Time Program Restructuring in Multiprogrammed Virtual Memory Systems
IEEE Transactions on Software Engineering
Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Branch history table prediction of moving target branches due to subroutine returns
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Strategies for branch target buffers
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Improving the accuracy of dynamic branch prediction using branch correlation
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Predicting conditional branch directions from previous runs of a program
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A comprehensive instruction fetch mechanism for a processor supporting speculative execution
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A comparison of dynamic branch predictors that use two levels of branch history
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Link-time optimization of address calculation on a 64-bit architecture
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Fast and accurate instruction fetch and branch prediction
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
ICS '90 Proceedings of the 4th international conference on Supercomputing
Optimal Sequential Partitions of Graphs
Journal of the ACM (JACM)
Improving locality by critical working sets
Communications of the ACM
Branch Target Buffer Design and Optimization
IEEE Transactions on Computers
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Corpus-based static branch prediction
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Performance issues in correlated branch prediction schemes
Proceedings of the 28th annual international symposium on Microarchitecture
The predictability of branches in libraries
Proceedings of the 28th annual international symposium on Microarchitecture
A system level perspective on branch architecture performance
Proceedings of the 28th annual international symposium on Microarchitecture
SPAID: software prefetching in pointer- and call-intensive environments
Proceedings of the 28th annual international symposium on Microarchitecture
Analysis of branch prediction via data compression
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Evidence-based static branch prediction using machine learning
ACM Transactions on Programming Languages and Systems (TOPLAS)
Hot cold optimization of large Windows/NT applications
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Efficient procedure mapping using cache line coloring
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Near-optimal intraprocedural branch alignment
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Improving performance by branch reordering
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Modeled and Measured Instruction Fetching Performance for Superscalar Microprocessors
IEEE Transactions on Parallel and Distributed Systems
Analyzing the working set characteristics of branch execution
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A scalable front-end architecture for fast instruction delivery
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Procedure placement using temporal-ordering information
ACM Transactions on Programming Languages and Systems (TOPLAS)
Static correlated branch prediction
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizations Enabled by a Decoupled Front-End Architecture
IEEE Transactions on Computers
Efficient and effective branch reordering using profile data
ACM Transactions on Programming Languages and Systems (TOPLAS)
An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors
International Journal of Parallel Programming
Code Positioning for VLIW Architectures
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Branch Prediction Using Profile Data
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
A Novel Probabilistic Data Flow Framework
CC '01 Proceedings of the 10th International Conference on Compiler Construction
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Optimization opportunities created by global data reordering
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Optimizing indirect branch prediction accuracy in virtual machine interpreters
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Proceedings of the 30th annual international symposium on Computer architecture
IEEE Transactions on Computers
Collecting and Exploiting High-Accuracy Call Graph Profiles in Virtual Machines
Proceedings of the international symposium on Code generation and optimization
A first look at the interplay of code reordering and configurable caches
GLSVLSI '05 Proceedings of the 15th ACM Great Lakes symposium on VLSI
Code placement for improving dynamic branch prediction accuracy
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
The instruction register file micro-architecture
Future Generation Computer Systems - Special issue: Parallel computing technologies
Improving WCET by applying a WC code-positioning optimization
ACM Transactions on Architecture and Code Optimization (TACO)
The Camino Compiler infrastructure
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Optimizing indirect branch prediction accuracy in virtual machine interpreters
ACM Transactions on Programming Languages and Systems (TOPLAS)
HitME: low power Hit MEmory buffer for embedded systems
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Blind Optimization for Exploiting Hardware Features
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
The instruction register file micro-architecture
Future Generation Computer Systems - Special issue: Parallel computing technologies
Multicore-aware hybrid code positioning to reduce worst-case execution time
Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Studying microarchitectural structures with object code reordering
Proceedings of the Workshop on Binary Instrumentation and Applications
Code alignment for architectures with pipeline group dispatching
Proceedings of the 3rd Annual Haifa Experimental Systems Conference
Compiler techniques to improve dynamic branch prediction for indirect jump and call instructions
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Combining code reordering and cache configuration
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.01 |
Several researchers have proposed algorithms for basic block reordering. We call these branch alignment algorithms. The primary emphasis of these algorithms has been on improving instruction cache locality, and the few studies concerned with branch prediction reported small or minimal improvements. As wide-issue architectures become increasingly popular the importance of reducing branch costs will increase, and branch alignment is one mechanism which can effectively reduce these costs.In this paper, we propose an improved branch alignment algorithm that takes into consideration the architectural cost model and the branch prediction architecture when performing the basic block reordering. We show that branch alignment algorithms can improve a broad range of static and dynamic branch prediction architectures. We also show that a program performance can be improved by approximately 5% even when using recently proposed, highly accurate branch prediction architectures. The programs are compiled by any existing compiler and then transformed via binary transformations. When implementing these algorithms on a Alpha AXP 21604 up to a 16% reduction in total execution time is achieved.