Rethinking custom ISE identification: a new processor-agnostic method
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Fast, quasi-optimal, and pipelined instruction-set extensions
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Fast enumeration of maximal valid subgraphs for custom-instruction identification
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Modern development methods and tools for embedded reconfigurable systems: A survey
Integration, the VLSI Journal
Fast, nearly optimal ISE identification with I/O serialization through maximal clique enumeration
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
The Instruction-Set Extension Problem: A Survey
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Practical and effective domain-specific function unit design for CGRA
ICCSA'11 Proceedings of the 2011 international conference on Computational science and Its applications - Volume Part V
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Complexity of computing convex subgraphs in custom instruction synthesis
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Accelerating an application domain with specialized functional units
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Performance of an application can be improved through augmenting the processor with Application specific Functional Units (AFUs). Usually a cluster of operations identified from the application forms the behavior of an AFU. Several researchers studied the impact of Input and Output (I/O) constraints for a legal operation cluster on the overall achievable speedup. The general observation is that the speedup potential grows with the relaxation of I/O constraints. Going further, in this paper, we investigate the speedup potential of AFUs in the absence of I/O constraints. Design challenge in the absence of I/O constraints is addressed in a very practical manner, through the identification of maximal convex subgraphs. Usually the available register ports are few but the number of inputs/outputs of the identified patterns are likely to be large. We solve the register port limitation by the design of distributed I/O functional units, in which the operands are communicated in multiple cycles. The experimental results show that selection of maximal clusters achieves average 50% higher speedup than selecting I/O constrained operation clusters. Also, our identification algorithm runs 2 to 3 orders faster than an exhaustive identification approach.