Custom-instruction synthesis for extensible-processor platforms

Authors:
Fei Sun;S. Ravi;A. Raghunathan;N. K. Jha
Affiliations:
Dept. of Electr. Eng., Princeton Univ., NJ, USA;-;-;-
Venue:
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Year:
2006

Citing 0
Cited 30

Satisfying real-time constraints with custom instructions

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Novel architecture for loop acceleration: a case study

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Exploiting pipelining to relax register-file port constraints of instruction-set extensions

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Hardware/software managed scratchpad memory for embedded system

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Battery-aware instruction generation for embedded processors

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Automating processor customisation: optimised memory access and resource sharing

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Heterogeneous multiprocessor implementations for JPEG:: a case study

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Utilizing custom registers in application-specific instruction set processors for register spills elimination

Proceedings of the 17th ACM Great Lakes symposium on VLSI
Design methodology for pipelined heterogeneous multiprocessor system

Proceedings of the 44th annual Design Automation Conference
Efficient ASIP design for configurable processors with fine-grained resource sharing

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Architectural exploration of heterogeneous multiprocessor systems for JPEG

International Journal of Parallel Programming - Special Issue on Multiprocessor-based embedded systems
Enhancing energy efficiency of processor-based embedded systems through post-fabrication ISA extension

Proceedings of the 13th international symposium on Low power electronics and design
Analysis and design of a hardware/software trusted platform module for embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Rapid design of area-efficient custom instructions for reconfigurable embedded processing

Journal of Systems Architecture: the EUROMICRO Journal
Fast Custom Instruction Identification Algorithm Based on Basic Convex Pattern Model for Supporting ASIP Automated Design

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
A scalable synthesis methodology for application-specific processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Modern development methods and tools for embedded reconfigurable systems: A survey

Integration, the VLSI Journal
Memory organization and data layout for instruction set extensions with architecturally visible storage

Proceedings of the 2009 International Conference on Computer-Aided Design
A novel multi-objective instruction synthesis flow for application-specific instruction set processors

Proceedings of the 20th symposium on Great lakes symposium on VLSI
Resource sharing of pipelined custom hardware extension for energy-efficient application-specific instruction set processor design

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Exploring custom instruction synthesis for application-specific instruction set processors with multiple design objectives

Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Exploiting statistical information for implementation of instruction scratchpad memory in embedded system

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The Instruction-Set Extension Problem: A Survey

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Improving performance and energy efficiency of embedded processors via post-fabrication instruction set customization

The Journal of Supercomputing
Hardware-software co-design of AES on FPGA

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Resource Sharing of Pipelined Custom Hardware Extension for Energy-Efficient Application-Specific Instruction Set Processor Design

ACM Transactions on Design Automation of Electronic Systems (TODAES)
A Hardware/Software Cooperative Custom Register Binding Approach for Register Spill Elimination in Application-Specific Instruction Set Processors

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Complexity of computing convex subgraphs in custom instruction synthesis

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems

ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
Rapid evaluation of custom instruction selection approaches with FPGA estimation

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.03

Visualization

Abstract

Efficiency and flexibility are critical, but often conflicting, design goals in embedded system design. The recent emergence of extensible processors promises a favorable tradeoff between efficiency and flexibility, while keeping design turnaround times short. Current extensible processor design flows automate several tedious tasks, but typically require designers to manually select the parts of the program that are to be implemented as custom instructions. In this work, we describe an automatic methodology to select custom instructions to augment an extensible processor, in order to maximize its efficiency for a given application program. We demonstrate that the number of custom instruction candidates grows rapidly with program size, leading to a large design space, and that the quality (speedup) of custom instructions varies significantly across this space, motivating the need for the proposed flow. Our methodology features cost functions to guide the custom instruction selection process, as well as static and dynamic pruning techniques to eliminate inferior parts of the design space from consideration. Furthermore, we employ a two-stage process, wherein a limited number of promising instruction candidates are first short-listed using efficient selection criteria, and then evaluated in more detail through cycle-accurate instruction set simulation and synthesis of the corresponding hardware, to identify the custom instruction combinations that result in the highest program speedup or maximize speedup under a given area constraint. We have evaluated the proposed techniques using a state-of-the-art extensible processor platform, in the context of a commercial design flow. Experiments with several benchmark programs indicate that custom processors synthesized using automatic custom instruction selection can result in large improvements in performance (up to 5.4×, an average of 3.4×), energy (up to 4.5×, an average of 3.2×), and energy-delay products (up to 24.2×, an average of 12.6×), while speeding up the design process significantly.