Efficient instruction scheduling for a pipelined architecture
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Branch folding in the CRISP microprocessor: reducing branch delay to zero
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Improving the accuracy of dynamic branch prediction using branch correlation
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Custom-fit processors: letting applications define architectures
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Customized instruction-sets for embedded processors
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Low-cost branch folding for embedded applications with small tight loops
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Static correlated branch prediction
ACM Transactions on Programming Languages and Systems (TOPLAS)
Hi-index | 0.00 |
We present a methodology for microarchitectural customization of embedded processors by exploiting application information, thus attaining the twin benefits of processor standardization and application-specific customization. Such powerful techniques enable increased application fragments to be placed on the processor, with no sacrifice in system requirements, thus reducing the custom hardware and the concomitant area requirements in SOCs. We illustrate these ideas through the branch resolution problem, known to impose severe performance degradation on control-dominated embedded applications. A low-cost late customizable hardware that uses application information to fold out a set of frequently executed branches is described. Experimental results show that for a representative set of control dominated applications a reduction in the range of 7%-22% in processor cycles can be achieved, thus extending the scope of low-cost embedded processors in complex co-designs for control intensive systems.