Branch strategies to optimize decision trees for wide-issue architectures

Authors:
Patrick Carribault;Christophe Lemuet;Jean-Thomas Acquaviva;Albert Cohen;William Jalby
Affiliations:
PRiSM, University of Versailles;PRiSM, University of Versailles;PRiSM, University of Versailles;ALCHEMY group, INRIA Futurs, Orsay;PRiSM, University of Versailles
Venue:
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Year:
2004

Citing 6
Cited 1

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Itanium 2 Processor Microarchitecture

IEEE Micro
Optimizing indirect branch prediction accuracy in virtual machine interpreters

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
A Dynamically Tuned Sorting Library

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization

Deep Jam: Conversion of Coarse-Grain Parallelism to Instruction-Level and Vector Parallelism for Irregular Applications

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Branch predictors are associated with critical design issues for nowadays instruction greedy processors. We study two important domains where the optimization of decision trees — implemented through switch-case or nested if-then-else constructs — makes the precise modeling of these hardware mechanisms determining for performance: compute-intensive libraries with versioning and cloning, and high-performance interpreters. Against common belief, the complexity of recent microarchitectures does not necessarily hamper the design of accurate cost models, in the special case of decision trees. We build a simple model that illustrates the reasons for which decision tree performance is predictable. Based on this model, we compare the most significant code generation strategies on the Itanium2 processor. We show that no strategy dominates in all cases, and although they used to be penalized by traditional superscalar processors, indirect branches regain a lot of interest in the context of predicated execution and delayed branches. We validate our study with an improvement from 15% to 40% over Intel ICC compiler for a Daxpy code focused on short vectors.