Automatic application-specific instruction-set extensions under microarchitectural constraints
Proceedings of the 40th annual Design Automation Conference
Automatic generation of application specific processors
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Processor Acceleration Through Automated Instruction Set Customization
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Introduction of local memory elements in instruction set extensions
Proceedings of the 41st annual Design Automation Conference
Scalable custom instructions identification for instruction-set extensible processors
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
ISEGEN: Generation of High-Quality Instruction Set Extensions by Iterative Improvement
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Architecture Exploration for a Reconfigurable Architecture Template
IEEE Design & Test
Exploiting pipelining to relax register-file port constraints of instruction-set extensions
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Exploiting forwarding to improve data bandwidth of instruction-set extensions
Proceedings of the 43rd annual Design Automation Conference
Customizable Embedded Processors: Design Technologies and Applications
Customizable Embedded Processors: Design Technologies and Applications
Processor Description Languages
Processor Description Languages
A design flow for architecture exploration and implementation of partially reconfigurable processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
Way Stealing: cache-assisted automatic instruction set extensions
Proceedings of the 46th Annual Design Automation Conference
Proceedings of the 2009 International Conference on Computer-Aided Design
Design-space exploration of resource-sharing solutions for custom instruction set extensions
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Fast, nearly optimal ISE identification with I/O serialization through maximal clique enumeration
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Energy efficient special instruction support in an embedded processor with compact isa
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
The conflicting requirements of performance and flexibility in today's embedded system market are forcing system designers to use more and more of the so called Configurable or Customizable processor cores. Such processors tend to meet the demanding performance constraints by accommodating application specific Instruction-Set Extensions (ISEs) which have, naturally, become a vital component of current processor customization flows. One major bottleneck in maximizing ISE performance is the limitation on the data-bandwidth between the General Purpose Register(GPR) file and the ISEs. For improved performance, it is desirable to have a large data-bandwidth from the GPRs to ISEs. However, the tight area constraints of modern embedded processors often restrict the GPR I/O of ISEs to save port area of the register files. This paper presents a novel approach to increase the GPR I/O of ISEs without significantly increasing the size of the GPR files. This is achieved by applying the concept of register clustering, common in many VLIW architectures, to single-issue processors with high performance ISEs. Such clustering often causes extra register moves in compiled code. This work also presents an algorithm to minimize such register moves. The benchmark results presented in this paper show that our solution can significantly reduce the area overhead of many-port GPR files without sacrificing the performance improvements through ISEs.