Efficient instruction scheduling for a pipelined architecture
SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Branch folding in the CRISP microprocessor: reducing branch delay to zero
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Improving the accuracy of dynamic branch prediction using branch correlation
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Custom-fit processors: letting applications define architectures
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Customized instruction-sets for embedded processors
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
Low-cost branch folding for embedded applications with small tight loops
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Static correlated branch prediction
ACM Transactions on Programming Languages and Systems (TOPLAS)
Automated design of finite state machine predictors for customized processors
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Data cache energy minimizations through programmable tag size matching to the applications
Proceedings of the 14th international symposium on Systems synthesis
Performance and power effectiveness in embedded processors customizable partitioned caches
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Application specific non-volatile primary memory for embedded systems
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Instruction Cache Locking for Embedded Systems using Probability Profile
Journal of Signal Processing Systems
Hi-index | 0.01 |
We present a customization framework for embedded processors which employs the utilization of application-specific information, thus specializing the processor's microarchitecture to the application needs. The increased processor utilization leads to a low-cost system implementation with no sacrifice in performance requirements and to reduced custom hardware in a typical SOC. We illustrate these ideas through the branch resolution problem, known to impose severe performance degradation on control-dominated embedded applications. A customization approach for early branch resolution and subsequent folding is presented. The application-specific information is captured by the microarchitecture through a low-cost reprogrammable hardware, thus attaining the twin benefits of processor standardization and application-specific customization. Experimental results show that for a representative set of control-dominated applications a reduction in the range of 3--22% in processor cycles can be achieved, thus extending the scope of low-cost embedded processors in complex codesigns for control intensive systems.