ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
A portable global optimizer and linker
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Avoiding conditional branches by code replication
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Interprocedural conditional branch elimination
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Advanced compiler design and implementation
Advanced compiler design and implementation
ARM System-on-Chip Architecture
ARM System-on-Chip Architecture
Neural methods for dynamic branch prediction
ACM Transactions on Computer Systems (TOCS)
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Proceedings of the 2008 ACM symposium on Applied computing
Architecture Optimization of Application-Specific Implicit Instructions
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on CAPA'09, Special Section on WHS'09, and Special Section VCPSS' 09
Hi-index | 0.00 |
A significant portion of a program's execution cycles are typically dedicated to performing conditional transfers of control. Much of the research on reducing the costs of these operations has focused on the branch, while the comparison has been largely ignored. In this paper we investigate reducing the cost of comparisons in conditional transfers of control. We decouple the specification of the values to be compared from the actual comparison itself, which now occurs as part of the branch instruction. The specification of the register or immediate values involved in the comparison is accomplished via a new instruction called a comparison specification, which is loop invariant. Decoupling the specification of the comparison from the actual comparison performed before the branch reduces the number of instructions in the loop, which provides performance benefits not possible when using conventional comparison instructions. Results from applying this technique on the ARM processor show that both the number of instructions executed and execution cycles are reduced.