Accelerating an application domain with specialized functional units

Authors:
Cecilia González-Álvarez;Jennifer B. Sartor;Carlos Álvarez;Daniel Jiménez-González;Lieven Eeckhout
Affiliations:
Ghent University & UPC, Gent, Belgium;Ghent University, Gent, Belgium;UPC, Barcelona, Spain;UPC, Barcelona, Spain;Ghent University, Gent, Belgium
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2013

Citing 24
Cited 0

Designing domain-specific processors

Proceedings of the ninth international symposium on Hardware/software codesign
Xtensa: A Configurable and Extensible Processor

IEEE Micro
Application-specific instruction generation for configurable processor architectures

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Scalable custom instructions identification for instruction-set extensible processors

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
The MOLEN Polymorphic Processor

IEEE Transactions on Computers
Automated Custom Instruction Generation for Domain-Specific Processor Acceleration

IEEE Transactions on Computers
Taylor Expansion Diagrams: A Canonical Representation for Verification of Data Flow Designs

IEEE Transactions on Computers
Variable ordering for taylor expansion diagrams

HLDVT '04 Proceedings of the High-Level Design Validation and Test Workshop, 2004. Ninth IEEE International
Application Specific Datapath Extension with Distributed I/O Functional Units

VLSID '07 Proceedings of the 20th International Conference on VLSI Design held jointly with 6th International Conference: Embedded Systems
Rethinking custom ISE identification: a new processor-agnostic method

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Organization of computer systems: the fixed plus variable structure computer

IRE-AIEE-ACM '60 (Western) Papers presented at the May 3-5, 1960, western joint IRE-AIEE-ACM computer conference
MediaBench II video: Expediting the next generation of video systems research

Microprocessors & Microsystems
Fast custom instruction identification by convex subgraph enumeration

ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
Code transformation and instruction set extension

ACM Transactions on Embedded Computing Systems (TECS)
Fast enumeration of maximal valid subgraphs for custom-instruction identification

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Instruction Selection in ASIP Synthesis Using Functional Matching

VLSID '10 Proceedings of the 2010 23rd International Conference on VLSI Design
Conservation cores: reducing the energy of mature computations

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Understanding sources of inefficiency in general-purpose chips

Proceedings of the 37th annual international symposium on Computer architecture
Fast, nearly optimal ISE identification with I/O serialization through maximal clique enumeration

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Customizable Domain-Specific Computing

IEEE Design & Test
Dark silicon and the end of multicore scaling

Proceedings of the 38th annual international symposium on Computer architecture
Exact and approximate algorithms for the extension of embedded processor instruction sets

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
FISH: Fast Instruction SyntHesis for Custom Processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Hardware specialization has received renewed interest recently as chips are hitting power limits. Chip designers of traditional processor architectures have primarily focused on general-purpose computing, partially due to time-to-market pressure and simpler design processes. But new power limits require some chip specialization. Although hardware configured for a specific application yields large speedups for low-power dissipation, its design is more complex and less reusable. We instead explore domain-based specialization, a scalable approach that balances hardware’s reusability and performance efficiency. We focus on specialization using customized compute units that accelerate particular operations. In this article, we develop automatic techniques to identify code sequences from different applications within a domain that can be targeted to a new custom instruction that will be run inside a configurable specialized functional unit (SFU). We demonstrate that using a canonical representation of computations finds more common code sequences among applications that can be mapped to the same custom instruction, leading to larger speedups while specializing a smaller core area than previous pattern-matching techniques. We also propose new heuristics to narrow the search space of domain-specific custom instructions, finding those that achieve the best performance across applications. We estimate the overall performance achieved with our automatic techniques using hardware models on a set of nine media benchmarks, showing that when limiting the core area devoted to specialization, the SFU customization with the largest speedups includes both application- and domain-specific custom instructions. We demonstrate that exploring domain-specific hardware acceleration is key to continued computing system performance improvements.