Near-optimal intraprocedural branch alignment

Authors:
Cliff Young;David S. Johnson;Michael D. Smith;David R. Karger
Affiliations:
Harvard University;AT&T Labs;Harvard University;Massachusetts Institute of Technology
Venue:
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Year:
1997

Citing 19
Cited 15

Reducing the cost of branches

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Program optimization for instruction caches

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Predicting conditional branch directions from previous runs of a program

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
The multiflow trace scheduling compiler

The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Improving semi-static branch prediction by code replication

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Improving the accuracy of static branch prediction using branch correlation

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing branch costs via branch alignment

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Avoiding conditional branches by code replication

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Performance of linear-space search algorithms

Artificial Intelligence
A comparative analysis of schemes for correlated branch prediction

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Delivering binary object modification tools for program tools for program analysis and optimization

Digital Technical Journal
Asymptotic experimental analysis for the Held-Karp traveling salesman bound

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Optimizing instruction cache performance for operating system intensive workloads

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Finding Cuts in the TSP (A preliminary report)

Finding Cuts in the TSP (A preliminary report)

System support for automatic profiling and optimization

Proceedings of the sixteenth ACM symposium on Operating systems principles
Procedure placement using temporal ordering information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving performance by branch reordering

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Analyzing the working set characteristics of branch execution

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Procedure placement using temporal-ordering information

ACM Transactions on Programming Languages and Systems (TOPLAS)
Static correlated branch prediction

ACM Transactions on Programming Languages and Systems (TOPLAS)
Relational profiling: enabling thread-level parallelism in virtual machines

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Efficient and effective branch reordering using profile data

ACM Transactions on Programming Languages and Systems (TOPLAS)
Code Positioning for VLIW Architectures

HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
The Asymmetric Traveling Salesman Problem: Algorithms, Instance Generators, and Tests

ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
Code placement for improving dynamic branch prediction accuracy

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Multicore-aware hybrid code positioning to reduce worst-case execution time

Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Studying microarchitectural structures with object code reordering

Proceedings of the Workshop on Binary Instrumentation and Applications
Enhanced operating system security through efficient and fine-grained address space randomization

Security'12 Proceedings of the 21st USENIX conference on Security symposium
Are Stacker Crane Problems easy? A statistical study

Computers and Operations Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Branch alignment reorders the basic blocks of a program to minimize pipeline penalties due to control-transfer instructions. Prior work in branch alignment has produced useful heuristic methods. We present a branch alignment algorithm that usually achieves the minimum possible pipeline penalty and on our benchmarks averages within 0.3% of a provable optimum. We compare the control penalties and running times of our algorithm to an older, greedy approach and observe that both the greedy method and our method are close to the lower bound on control penalties, suggesting that greedy is good enough. Surprisingly, in actual execution our method produces programs that run noticeably faster than the greedy method. We also report results from training and testing on different data sets, validating that our results can be achieved in real-world usage. Training and testing on different data sets slightly reduced the benefits from both branch alignment algorithms, but the ranking of the algorithms does not change, and the bulk of the benefits remain.