Rapid evaluation of custom instruction selection approaches with FPGA estimation

Authors:
Siew-Kei Lam;Thambipillai Srikanthan;Christopher T. Clarke
Affiliations:
Nanyang Technological University, Nanyang Drive, Singapore;Nanyang Technological University, Nanyang Drive, Singapore;University of Bath, Bath, United Kingdom
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2014

Citing 31
Cited 0

Greed is good: approximating independent sets in sparse and bounded-degree graphs

STOC '94 Proceedings of the twenty-sixth annual ACM symposium on Theory of computing
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Instruction generation for hybrid reconfigurable systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Reconfigurable Instruction Set Processors from a Hardware/Software Perspective

IEEE Transactions on Software Engineering
Efficient instruction encoding for automatic instruction set design of configurable ASIPs

Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
FPGA resource and timing estimation from Matlab execution traces

Proceedings of the tenth international symposium on Hardware/software codesign
A graph covering algorithm for a coarse grain reconfigurable system

Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Accurate Area and Delay Estimators for FPGAs

Proceedings of the conference on Design, automation and test in Europe
Processor Acceleration Through Automated Instruction Set Customization

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Application-specific instruction generation for configurable processor architectures

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
An area estimation methodology for FPGA based designs at systemc-level

Proceedings of the 41st annual Design Automation Conference
Characterizing embedded applications for instruction-set extensible processors

Proceedings of the 41st annual Design Automation Conference
Automated Custom Instruction Generation for Domain-Specific Processor Acceleration

IEEE Transactions on Computers
Modeling Arbitrator Delay-Area Dependencies in Customizable Instruction Set Processors

DELTA '06 Proceedings of the Third IEEE International Workshop on Electronic Design, Test and Applications
DAOmap: a depth-optimal area optimization mapping algorithm for FPGA designs

Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design
Compile-time area estimation for LUT-based FPGAs

ACM Transactions on Design Automation of Electronic Systems (TODAES)
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Optimal simultaneous mapping and clustering for FPGA delay optimization

Proceedings of the 43rd annual Design Automation Conference
Automatic selection of application-specific instruction-set extensions

CODES+ISSS '06 Proceedings of the 4th international conference on Hardware/software codesign and system synthesis
Architecture and compiler optimizations for data bandwidth improvement in configurable processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Rapid design of area-efficient custom instructions for reconfigurable embedded processing

Journal of Systems Architecture: the EUROMICRO Journal
Recurrence-aware instruction set selection for extensible embedded processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Customizing the datapath and ISA of soft VLIW processors

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Selecting profitable custom instructions for reconfigurable processors

Journal of Systems Architecture: the EUROMICRO Journal
Architecture-Aware Technique for Mapping Area-Time Efficient Custom Instructions onto FPGAs

IEEE Transactions on Computers
Bitwidth cognizant architecture synthesis of custom hardware accelerators

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Custom-instruction synthesis for extensible-processor platforms

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Exact and approximate algorithms for the extension of embedded processor instruction sets

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Design Space Pruning Through Early Estimations of Area/Delay Tradeoffs for FPGA Implementations

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
A Synthesis Methodology for Hybrid Custom Instruction and Coprocessor Generation for Extensible Processors

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
CHIPS: Custom Hardware Instruction Processor Synthesis

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The main aim of this article is to demonstrate that a fast and accurate FPGA estimation engine is indispensable in design flows for custom instruction (template) selection. The need for a FPGA estimation engine stems from the difficulty in predicting the FPGA performance measures of selected custom instructions. We will present a FPGA estimation technique that partitions the high-level representation of custom instructions into clusters based on the structural organization of the target FPGA, while taking into account general logic synthesis principles adopted by FPGA tools. In this work, we have evaluated a widely used graph covering algorithm with various heuristics for custom instruction selection. In addition, we present an algorithm called Refined Largest Fit First (RLFF) that relies on a graph covering heuristic to select non-overlapping superset templates, which typically incorporate frequently used basic templates. The initial solution is further refined by considering overlapping templates that were ignored previously to see if their introduction could lead to higher performance. While RLFF provides the most efficient cover compared to the ILP method and other graph covering heuristics, FPGA estimation results reveals that RLFF leads to the worst performance in certain applications. It is therefore a worthy proposition to equip design flows with accurate FPGA estimation in order to rapidly determine the most profitable custom instruction approach for a given application.