Microprocessor Architectures: From VLIW to Tta
Microprocessor Architectures: From VLIW to Tta
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Using Dynamic Binary Translation to Fuse Dependent Instructions
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dynamic coalescing for 16-bit instructions
ACM Transactions on Embedded Computing Systems (TECS)
Frequent Loop Detection Using Efficient Nonintrusive On-Chip Hardware
IEEE Transactions on Computers
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Reducing the cost of conditional transfers of control by using comparison specifications
Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Compilers: Principles, Techniques, and Tools (2nd Edition)
Compilers: Principles, Techniques, and Tools (2nd Edition)
Proceedings of the 2008 ACM symposium on Applied computing
Language-driven Exploration and Implementation of Partially Re-configurable ASIPs
Language-driven Exploration and Implementation of Partially Re-configurable ASIPs
Hi-index | 0.00 |
Dynamic configuration of application-specific implicit instructions has been proposed to better exploit the available parallelism at the instruction level in pipelined processors. The support of such implicit instruction issue-requires the pipeline to be extended with a trigger table that describes the instruction implicitly issued as a response to a value written into a triggering register by a triggering instruction (which may be an add or sub instruction). In this article, we explore the design optimization of the trigger table to maximize the number of instructions that can be implicitly issued while keeping the limited size of the trigger table. The concept of implicitly issued instruction has been formally defined by considering the inter-basic block analysis of control and data dependencies. A compilation tool chain has been developed to automatically identify the optimization opportunities, taking into account the constraints imposed by control and data dependencies as well as by architectural limitations. The proposed solutions have been applied to the case of a baseline scalar MIPS processor where, for the selected set of benchmarks (DSPStone and Mibench/automotive), we obtained an average speedup of 17%.