Designing domain-specific processors
Proceedings of the ninth international symposium on Hardware/software codesign
Application-specific instruction generation for configurable processor architectures
FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Scalable custom instructions identification for instruction-set extensible processors
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
The MOLEN Polymorphic Processor
IEEE Transactions on Computers
Automated Custom Instruction Generation for Domain-Specific Processor Acceleration
IEEE Transactions on Computers
Taylor Expansion Diagrams: A Canonical Representation for Verification of Data Flow Designs
IEEE Transactions on Computers
Variable ordering for taylor expansion diagrams
HLDVT '04 Proceedings of the High-Level Design Validation and Test Workshop, 2004. Ninth IEEE International
Application Specific Datapath Extension with Distributed I/O Functional Units
VLSID '07 Proceedings of the 20th International Conference on VLSI Design held jointly with 6th International Conference: Embedded Systems
Rethinking custom ISE identification: a new processor-agnostic method
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Organization of computer systems: the fixed plus variable structure computer
IRE-AIEE-ACM '60 (Western) Papers presented at the May 3-5, 1960, western joint IRE-AIEE-ACM computer conference
MediaBench II video: Expediting the next generation of video systems research
Microprocessors & Microsystems
Fast custom instruction identification by convex subgraph enumeration
ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
Code transformation and instruction set extension
ACM Transactions on Embedded Computing Systems (TECS)
Fast enumeration of maximal valid subgraphs for custom-instruction identification
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Instruction Selection in ASIP Synthesis Using Functional Matching
VLSID '10 Proceedings of the 2010 23rd International Conference on VLSI Design
Conservation cores: reducing the energy of mature computations
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Understanding sources of inefficiency in general-purpose chips
Proceedings of the 37th annual international symposium on Computer architecture
Fast, nearly optimal ISE identification with I/O serialization through maximal clique enumeration
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Customizable Domain-Specific Computing
IEEE Design & Test
Dark silicon and the end of multicore scaling
Proceedings of the 38th annual international symposium on Computer architecture
Exact and approximate algorithms for the extension of embedded processor instruction sets
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
FISH: Fast Instruction SyntHesis for Custom Processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
Hardware specialization has received renewed interest recently as chips are hitting power limits. Chip designers of traditional processor architectures have primarily focused on general-purpose computing, partially due to time-to-market pressure and simpler design processes. But new power limits require some chip specialization. Although hardware configured for a specific application yields large speedups for low-power dissipation, its design is more complex and less reusable. We instead explore domain-based specialization, a scalable approach that balances hardware’s reusability and performance efficiency. We focus on specialization using customized compute units that accelerate particular operations. In this article, we develop automatic techniques to identify code sequences from different applications within a domain that can be targeted to a new custom instruction that will be run inside a configurable specialized functional unit (SFU). We demonstrate that using a canonical representation of computations finds more common code sequences among applications that can be mapped to the same custom instruction, leading to larger speedups while specializing a smaller core area than previous pattern-matching techniques. We also propose new heuristics to narrow the search space of domain-specific custom instructions, finding those that achieve the best performance across applications. We estimate the overall performance achieved with our automatic techniques using hardware models on a set of nine media benchmarks, showing that when limiting the core area devoted to specialization, the SFU customization with the largest speedups includes both application- and domain-specific custom instructions. We demonstrate that exploring domain-specific hardware acceleration is key to continued computing system performance improvements.