ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Predicting conditional branch directions from previous runs of a program
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The multiflow trace scheduling compiler
The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Improving semi-static branch prediction by code replication
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Improving the accuracy of static branch prediction using branch correlation
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing branch costs via branch alignment
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Avoiding conditional branches by code replication
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Performance of linear-space search algorithms
Artificial Intelligence
A comparative analysis of schemes for correlated branch prediction
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Delivering binary object modification tools for program tools for program analysis and optimization
Digital Technical Journal
Asymptotic experimental analysis for the Held-Karp traveling salesman bound
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Optimizing instruction cache performance for operating system intensive workloads
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Finding Cuts in the TSP (A preliminary report)
Finding Cuts in the TSP (A preliminary report)
System support for automatic profiling and optimization
Proceedings of the sixteenth ACM symposium on Operating systems principles
Procedure placement using temporal ordering information
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving performance by branch reordering
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Analyzing the working set characteristics of branch execution
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Procedure placement using temporal-ordering information
ACM Transactions on Programming Languages and Systems (TOPLAS)
Static correlated branch prediction
ACM Transactions on Programming Languages and Systems (TOPLAS)
Relational profiling: enabling thread-level parallelism in virtual machines
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Efficient and effective branch reordering using profile data
ACM Transactions on Programming Languages and Systems (TOPLAS)
Code Positioning for VLIW Architectures
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
The Asymmetric Traveling Salesman Problem: Algorithms, Instance Generators, and Tests
ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
Code placement for improving dynamic branch prediction accuracy
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Multicore-aware hybrid code positioning to reduce worst-case execution time
Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Studying microarchitectural structures with object code reordering
Proceedings of the Workshop on Binary Instrumentation and Applications
Enhanced operating system security through efficient and fine-grained address space randomization
Security'12 Proceedings of the 21st USENIX conference on Security symposium
Are Stacker Crane Problems easy? A statistical study
Computers and Operations Research
Hi-index | 0.00 |
Branch alignment reorders the basic blocks of a program to minimize pipeline penalties due to control-transfer instructions. Prior work in branch alignment has produced useful heuristic methods. We present a branch alignment algorithm that usually achieves the minimum possible pipeline penalty and on our benchmarks averages within 0.3% of a provable optimum. We compare the control penalties and running times of our algorithm to an older, greedy approach and observe that both the greedy method and our method are close to the lower bound on control penalties, suggesting that greedy is good enough. Surprisingly, in actual execution our method produces programs that run noticeably faster than the greedy method. We also report results from training and testing on different data sets, validating that our results can be achieved in real-world usage. Training and testing on different data sets slightly reduced the benefits from both branch alignment algorithms, but the ranking of the algorithms does not change, and the bulk of the benefits remain.