Proceedings of the conference on Design, automation and test in Europe: Proceedings
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Code transformation strategies for extensible embedded processors
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Proceedings of the 17th ACM Great Lakes symposium on VLSI
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Journal of VLSI Signal Processing Systems
Rethinking custom ISE identification: a new processor-agnostic method
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Efficient ASIP design for configurable processors with fine-grained resource sharing
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Fast, quasi-optimal, and pipelined instruction-set extensions
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Instruction set extension exploration in multiple-issue architecture
Proceedings of the conference on Design, automation and test in Europe
Speculative DMA for architecturally visible storage in instruction set extensions
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Rapid design of area-efficient custom instructions for reconfigurable embedded processing
Journal of Systems Architecture: the EUROMICRO Journal
An Algorithm for Finding Input-Output Constrained Convex Sets in an Acyclic Digraph
Graph-Theoretic Concepts in Computer Science
Recurrence-aware instruction set selection for extensible embedded processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Code transformation and instruction set extension
ACM Transactions on Embedded Computing Systems (TECS)
An Application Development Framework for ARISE Reconfigurable Processors
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Fast enumeration of maximal valid subgraphs for custom-instruction identification
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Way Stealing: cache-assisted automatic instruction set extensions
Proceedings of the 46th Annual Design Automation Conference
Algorithms for generating convex sets in acyclic digraphs
Journal of Discrete Algorithms
Scalable register bypassing for FPGA-based processors
Microprocessors & Microsystems
Modern development methods and tools for embedded reconfigurable systems: A survey
Integration, the VLSI Journal
Proceedings of the 2009 International Conference on Computer-Aided Design
Design-space exploration of resource-sharing solutions for custom instruction set extensions
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Instruction set extension generation with considering physical constraints
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Fast, nearly optimal ISE identification with I/O serialization through maximal clique enumeration
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Selecting profitable custom instructions for reconfigurable processors
Journal of Systems Architecture: the EUROMICRO Journal
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
A polynomial-time custom instruction identification algorithm based on dynamic programming
Proceedings of the 16th Asia and South Pacific Design Automation Conference
The Instruction-Set Extension Problem: A Survey
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
An efficient algorithm for custom instruction enumeration
Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
Scheduling and resource binding algorithm considering timing variation
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Practical and effective domain-specific function unit design for CGRA
ICCSA'11 Proceedings of the 2011 international conference on Computational science and Its applications - Volume Part V
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Accelerating loops for coarse grained reconfigurable architectures using instruction extensions
Proceedings of the 2011 ACM Symposium on Research in Applied Computation
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
An algorithm for finding input-output constrained convex sets in an acyclic digraph
Journal of Discrete Algorithms
Instruction set architectural guidelines for embedded packet-processing engines
Journal of Systems Architecture: the EUROMICRO Journal
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Exact custom instruction enumeration for extensible processors
Integration, the VLSI Journal
International Journal of Reconfigurable Computing - Special issue on Selected Papers from the International Conference on Reconfigurable Computing and FPGAs (ReConFig'10)
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Complexity of computing convex subgraphs in custom instruction synthesis
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Loop acceleration exploration for ASIP architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
LegUp: An open-source high-level synthesis tool for FPGA-based processor/accelerator systems
ACM Transactions on Embedded Computing Systems (TECS) - Special issue on application-specific processors
Considering the effect of process variations during the ISA extension design flow
Microprocessors & Microsystems
Accelerating an application domain with specialized functional units
ACM Transactions on Architecture and Code Optimization (TACO)
Instruction set extensions for dynamic time warping
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Rapid evaluation of custom instruction selection approaches with FPGA estimation
ACM Transactions on Embedded Computing Systems (TECS)
Extended Instruction Exploration for Multiple-Issue Architectures
ACM Transactions on Embedded Computing Systems (TECS)
A just-in-time customizable processor
Proceedings of the International Conference on Computer-Aided Design
Hi-index | 0.03 |
In embedded computing, cost, power, and performance constraints call for the design of specialized processors, rather than for the use of the existing off-the-shelf solutions. While the design of these application-specific CPUs could be tackled from scratch, a cheaper and more effective option is that of extending the existing processors and toolchains. Extensibility is indeed a feature now offered in real designs, e.g., by processors such as Tensilica Xtensa [T. R. Halfhill, Microprocess Rep., 2003], ARC ARCtangent [T. R. Halfhill, Microprocess Rep., 2000], STMicroelectronics ST200 [P. Faraboschi, G. Brown, J. A. Fisher, G. Desoli, and F. Homewood, Proc. 27th Annu. Int. Symp. Computer Architecture, 2000, p. 203], and MIPS CorExtend [T. R. Halfhill, Microprocess Rep., 2003]. While all these processors provide development environments with simulation capabilities for evaluating efficiently hand-crafted solutions, the tools to identify automatically the best processor configuration for a given application are less common. In particular, solutions to choose specialized instruction-set extensions (ISEs) have been investigated in the past years but are still seldom part of commercial toolchains. This paper provides a formal methodology and a set of algorithms that help address the problem. It proposes exact algorithms to derive optimal ISEs; exact identification of a single ISE is applicable to basic blocks of up to 1500 assembler-like instructions. This paper also introduces approximate methods that can process basic blocks of larger size. Results show that the described algorithms find solutions close to those that a designer would obtain by a detailed study of the application code. Both heuristic and exact algorithms find ISEs able to speed up unextended processors up to 5.0x. State-of-the-art comparisons show that the presented algorithms outperform existing ones by up to 2.6x