Lazy BTB: reduce BTB energy consumption using dynamic profiling
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Branchless cycle prediction for embedded processors
Proceedings of the 2006 ACM symposium on Applied computing
Block-aware instruction set architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Reducing the Number of Bits in the BTB to Attack the Branch Predictor Hot-Spot
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Thrifty BTB: A comprehensive solution for dynamic power reduction in branch target buffers
Microprocessors & Microsystems
Branch target buffer design for embedded processors
Microprocessors & Microsystems
Power-aware BTB for modern processors
Computers and Electrical Engineering
Power-aware branch logic: a hardware based technique for filtering access to branch logic
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Hi-index | 0.00 |
In this paper we present a methodology for a low-powerbranch identification mechanism, which enables the designof extremely power efficient branch predictors forembedded processors. The proposed technique utilizesapplication-specific information regarding the control-flowstructure of the program major loops. Such informationis used to completely eliminate the power hungry BranchTarget Buffer (BTB) lookups which normally occur at everyexecution cycle. Exact application knowledge regardingthe control-flow structure of the program obviates thepower expensive BTB operations, thus enabling the utilizationof contemporary branch predictors in high-end, yetpower-sensitive embedded processors. The utilization ofexact application knowledge results not only in the completeelimination of the power hungry BTB structure butalso in a perfect branch and target address identification. Acost-efficient and programmable hardware architecture forcapturing the control-flow structure of the program is presentedthereafter. The hardware complexity of the proposedarchitecture is carefully analyzed in terms of power, performanceand area overhead. The proposed technique deliverspower reductions in excess of 90% for a set of embeddedbenchmarks.