Code scheduling and register allocation in large basic blocks
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Greed is good: approximating independent sets in sparse and bounded-degree graphs
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Instruction generation for hybrid reconfigurable systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Automatic application-specific instruction-set extensions under microarchitectural constraints
Proceedings of the 40th annual Design Automation Conference
A graph covering algorithm for a coarse grain reconfigurable system
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Processor Acceleration Through Automated Instruction Set Customization
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Characterizing embedded applications for instruction-set extensible processors
Proceedings of the 41st annual Design Automation Conference
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors
Proceedings of the 32nd annual international symposium on Computer Architecture
Exploiting pipelining to relax register-file port constraints of instruction-set extensions
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Architecture and compilation for data bandwidth improvement in configurable embedded processors
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Bypass aware instruction scheduling for register file power reduction
Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Exploiting forwarding to improve data bandwidth of instruction-set extensions
Proceedings of the 43rd annual Design Automation Conference
RISPP: rotating instruction set processing platform
Proceedings of the 44th annual Design Automation Conference
An efficient framework for dynamic reconfiguration of instruction-set customization
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Increasing data-bandwidth to instruction-set extensions through register clustering
Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
An Energy-Efficient Processor Architecture for Embedded Systems
IEEE Computer Architecture Letters
AnySP: anytime anywhere anyway signal processing
Proceedings of the 36th annual international symposium on Computer architecture
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Operand Registers and Explicit Operand Forwarding
IEEE Computer Architecture Letters
PEPSC: A Power-Efficient Processor for Scientific Computing
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
The use of special instructions that execute complex operation patterns is a common approach in application specific processor design to improve performance and efficiency. However, in an embedded generic processor with compact instruction set architecture (ISA), such instructions may lead to large overhead as: i) more bits are needed to encode the extra opcodes and operands, resulting in wider instructions; ii) more register file (RF) ports are required to provide the extra operands to the function units. Such overhead may increase energy consumption considerably. In this paper, we propose to support flexible operation pair patterns in a processor with a compact 24-bit RISC-like ISA using: i) a partially reconfigurable decoder that exploits the locality of patterns to reduce the requirement for opcode space; ii) a software controlled bypass network to reduce the requirement for operand encoding and RF ports. We also propose an energy-aware compiler backend design for the proposed architecture that performs pattern selection and bypass-aware scheduling to generate energy efficient codes. Though proposed design imposes extra constraints on the operation patterns, the experimental results show that the average dynamic instruction count is reduced by over 25%, which is only about 2% less than the architecture without such constraints. Due to the low overhead, the total energy of the proposed architecture reduces by an average of 15.8% compared to the RISC baseline, while the one without constraints achieves almost no energy improvement.