Satisfying real-time constraints with custom instructions
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Novel architecture for loop acceleration: a case study
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Exploiting pipelining to relax register-file port constraints of instruction-set extensions
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Hardware/software managed scratchpad memory for embedded system
Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Battery-aware instruction generation for embedded processors
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Automating processor customisation: optimised memory access and resource sharing
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Heterogeneous multiprocessor implementations for JPEG:: a case study
CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Proceedings of the 17th ACM Great Lakes symposium on VLSI
Design methodology for pipelined heterogeneous multiprocessor system
Proceedings of the 44th annual Design Automation Conference
Efficient ASIP design for configurable processors with fine-grained resource sharing
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Architectural exploration of heterogeneous multiprocessor systems for JPEG
International Journal of Parallel Programming - Special Issue on Multiprocessor-based embedded systems
Proceedings of the 13th international symposium on Low power electronics and design
Analysis and design of a hardware/software trusted platform module for embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Rapid design of area-efficient custom instructions for reconfigurable embedded processing
Journal of Systems Architecture: the EUROMICRO Journal
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
A scalable synthesis methodology for application-specific processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Modern development methods and tools for embedded reconfigurable systems: A survey
Integration, the VLSI Journal
Proceedings of the 2009 International Conference on Computer-Aided Design
Proceedings of the 20th symposium on Great lakes symposium on VLSI
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The Instruction-Set Extension Problem: A Survey
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
The Journal of Supercomputing
Hardware-software co-design of AES on FPGA
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
ACM Transactions on Design Automation of Electronic Systems (TODAES)
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Complexity of computing convex subgraphs in custom instruction synthesis
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems
ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
Rapid evaluation of custom instruction selection approaches with FPGA estimation
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.03 |
Efficiency and flexibility are critical, but often conflicting, design goals in embedded system design. The recent emergence of extensible processors promises a favorable tradeoff between efficiency and flexibility, while keeping design turnaround times short. Current extensible processor design flows automate several tedious tasks, but typically require designers to manually select the parts of the program that are to be implemented as custom instructions. In this work, we describe an automatic methodology to select custom instructions to augment an extensible processor, in order to maximize its efficiency for a given application program. We demonstrate that the number of custom instruction candidates grows rapidly with program size, leading to a large design space, and that the quality (speedup) of custom instructions varies significantly across this space, motivating the need for the proposed flow. Our methodology features cost functions to guide the custom instruction selection process, as well as static and dynamic pruning techniques to eliminate inferior parts of the design space from consideration. Furthermore, we employ a two-stage process, wherein a limited number of promising instruction candidates are first short-listed using efficient selection criteria, and then evaluated in more detail through cycle-accurate instruction set simulation and synthesis of the corresponding hardware, to identify the custom instruction combinations that result in the highest program speedup or maximize speedup under a given area constraint. We have evaluated the proposed techniques using a state-of-the-art extensible processor platform, in the context of a commercial design flow. Experiments with several benchmark programs indicate that custom processors synthesized using automatic custom instruction selection can result in large improvements in performance (up to 5.4×, an average of 3.4×), energy (up to 4.5×, an average of 3.2×), and energy-delay products (up to 24.2×, an average of 12.6×), while speeding up the design process significantly.