PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Accurate static estimators for program optimization
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Static branch frequency and program profile analysis
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving the accuracy of indirect branch prediction via branch classification
ACM SIGARCH Computer Architecture News - Special issue on Interact-3 workshop
Design tradeoffs for the Alpha EV8 conditional branch predictor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dynamic Branch Prediction with Perceptrons
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Power Efficient Processor Architecture and The Cell Processor
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Optimizing Compiler for the CELL Processor
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
Proceedings of the 44th annual Design Automation Conference
SDRM: simultaneous determination of regions and function-to-region mapping for scratchpad memories
HiPC'08 Proceedings of the 15th international conference on High performance computing
Heap data management for limited local memory (LLM) multi-core processors
CODES/ISSS '10 Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Extending the cell SPE with energy efficient branch prediction
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Stack data management for Limited Local Memory (LLM) multi-core processors
ASAP '11 Proceedings of the ASAP 2011 - 22nd IEEE International Conference on Application-specific Systems, Architectures and Processors
Elimination of redundant memory traffic in high-level synthesis
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Hi-index | 0.00 |
As power-efficiency becomes paramount concern in processor design, architectures are coming up that completely do away with hardware branch prediction, and rely solely on software branch hinting. A popular example is the Synergistic Processing Unit (SPU) in the IBM Cell processor. To be able to minimize the branch penalty using branch hint instructions, in addition to estimating the branch probabilities (which has been looked at before [6, 25, 24]), it is important to carefully insert branch hints. Towards this, in this paper, we i) construct a branch penalty model for compiler, ii) formulate the problem of minimizing branch penalty using branch hinting and iii) propose a heuristic to solve this problem. The heuristic is based on three basic techniques that we introduce in this paper: NOP padding, hint pipelining, and nested loop restructuring. Experimental results on several benchmarks show that our solution can reduce the branch penalty as much as 35.4% over the previous approach.