Code scheduling and register allocation in large basic blocks
ICS '88 Proceedings of the 2nd international conference on Supercomputing
A high-performance microarchitecture with hardware-programmable functional units
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Greed is good: approximating independent sets in sparse and bounded-degree graphs
STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
Automatic detection of recurring operation patterns
CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
Automatic application-specific instruction-set extensions under microarchitectural constraints
Proceedings of the 40th annual Design Automation Conference
ConCISe: A Compiler-Driven CPLD-Based Instruction Set Accelerator
FCCM '99 Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Processor Acceleration Through Automated Instruction Set Customization
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Characterizing embedded applications for instruction-set extensible processors
Proceedings of the 41st annual Design Automation Conference
FITS: framework-based instruction-set tuning synthesis for embedded application specific processors
Proceedings of the 41st annual Design Automation Conference
Scalable custom instructions identification for instruction-set extensible processors
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
The MOLEN Polymorphic Processor
IEEE Transactions on Computers
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors
Proceedings of the 32nd annual international symposium on Computer Architecture
Exploiting pipelining to relax register-file port constraints of instruction-set extensions
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Architecture and compilation for data bandwidth improvement in configurable embedded processors
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Bypass aware instruction scheduling for register file power reduction
Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
Exploiting forwarding to improve data bandwidth of instruction-set extensions
Proceedings of the 43rd annual Design Automation Conference
Designing SOCs with Configured Cores: Unleashing the Tensilica Xtensa and Diamond Cores (Systems on Silicon)
RISPP: rotating instruction set processing platform
Proceedings of the 44th annual Design Automation Conference
An efficient framework for dynamic reconfiguration of instruction-set customization
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Increasing data-bandwidth to instruction-set extensions through register clustering
Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
An Energy-Efficient Processor Architecture for Embedded Systems
IEEE Computer Architecture Letters
A design flow for architecture exploration and implementation of partially reconfigurable processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
AnySP: anytime anywhere anyway signal processing
Proceedings of the 36th annual international symposium on Computer architecture
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Operand Registers and Explicit Operand Forwarding
IEEE Computer Architecture Letters
Conservation cores: reducing the energy of mature computations
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
The Instruction-Set Extension Problem: A Survey
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
PEPSC: A Power-Efficient Processor for Scientific Computing
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Code generation for STA architecture
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
QsCores: trading dark silicon for scalable energy efficiency with quasi-specific cores
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
FISH: Fast Instruction SyntHesis for Custom Processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Energy efficient special instruction support in an embedded processor with compact isa
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Scheduling for register file energy minimization in explicit datapath architectures
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
In application-specific processor design, a common approach to improve performance and efficiency is to use special instructions that execute complex operation patterns. However, in a generic embedded processor with compact Instruction Set Architecture (ISA), these special instructions may lead to large overhead such as: (i) more bits are needed to encode the extra opcodes and operands, resulting in wider instructions; (ii) more Register File (RF) ports are required to provide the extra operands to the function units. Such overhead may increase energy consumption considerably. In this article, we propose to support flexible operation pair patterns in a processor with a compact 24-bit RISC-like ISA using: (i) a partially reconfigurable decoder that exploits the pattern locality to reduce opcode space requirement; (ii) a software-controlled bypass network to reduce operand encoding bit and RF port requirement. An energy-aware compiler backend is designed for the proposed architecture that performs pattern selection and bypass-aware scheduling to generate energy-efficient codes. Though the proposed design imposes extra constraints on the operation patterns, the experimental results show that for benchmark applications from different domains, the average dynamic instruction count is reduced by over 25%, which is only about 2% less than the architecture without such constraints. The proposed architecture reduces total energy by an average of 15.8% compared to the RISC baseline, while the one without constraints achieves almost no improvement due to its high overhead. When high performance is required, the proposed architecture is able to achieve a speedup of 13.8% with 13.1% energy reduction compared to the baseline by introducing multicycle SFU operations.