Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization

Authors:
Nathan Clark;Manjunath Kudlur;Hyunchul Park;Scott Mahlke;Krisztian Flautner
Affiliations:
University of Michigan - Ann Arbor;University of Michigan - Ann Arbor;University of Michigan - Ann Arbor;University of Michigan - Ann Arbor;ARM Ltd., UK
Venue:
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Year:
2004

Citing 30
Cited 43

The multiflow trace scheduling compiler

The Journal of Supercomputing - Special issue on instruction-level parallelism
A high-performance microarchitecture with hardware-programmable functional units

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The performance potential of data dependence speculation & collapsing

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit

Proceedings of the 27th annual international symposium on Computer architecture
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
PipeRench implementation of the instruction path coprocessor

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
The effect of reconfigurable units in superscalar processors

FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
Dynamic Binary Translation and Optimization

IEEE Transactions on Computers
rePLay: A Hardware Framework for Dynamic Optimization

IEEE Transactions on Computers
Performance characterization of a hardware mechanism for dynamic optimization

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Instruction generation and regularity extraction for reconfigurable processors

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Processor reconfiguration through instruction-set metamorphosis

Computer
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
High-Performance 3-1 Interlock Collapsing ALU's

IEEE Transactions on Computers
MorphoSys: A Reconfigurable Processor Trageted to High Performance Image Application

Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Synthesis of custom processors based on extensible platforms

Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Automatic application-specific instruction-set extensions under microarchitectural constraints

Proceedings of the 40th annual Design Automation Conference
The Transmeta Code Morphing™ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Garp: a MIPS processor with a reconfigurable coprocessor

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
A Quantitative Analysis of Reconfigurable Coprocessors for Multimedia Applications

FCCM '98 Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines
Instruction Pre-Processing in Trace Processors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
DISE: a programmable macro engine for customizing applications

Proceedings of the 30th annual international symposium on Computer architecture
A dynamic instruction set computer

FCCM '95 Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines
Automatic generation of application specific processors

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Processor Acceleration Through Automated Instruction Set Customization

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
A high performance 32-bit ALU for programmable logic

FPGA '04 Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays
Characterizing embedded applications for instruction-set extensible processors

Proceedings of the 41st annual Design Automation Conference
From Sequences of Dependent Instructions to Functions: An Approach for Improving Performance without ILP or Speculation

Proceedings of the 31st annual international symposium on Computer architecture
Synthesis of application specific instruction sets

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Static strands: safely collapsing dependence chains for increasing embedded power efficiency

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Proceedings of the 32nd annual international symposium on Computer Architecture
Designing real-time H.264 decoders with dataflow architectures

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Exploring the design space of LUT-based transparent accelerators

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Serialization-Aware Mini-Graphs: Performance with Fewer Resources

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Utilizing custom registers in application-specific instruction set processors for register spills elimination

Proceedings of the 17th ACM Great Lakes symposium on VLSI
Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping

Proceedings of the International Symposium on Code Generation and Optimization
Binary synthesis

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Interactive presentation: Generating and executing multi-exit custom instructions for an adaptive extensible processor

Proceedings of the conference on Design, automation and test in Europe
Static strands: Safely exposing dependence chains for increasing embedded power efficiency

ACM Transactions on Embedded Computing Systems (TECS) - Special Section LCTES'05
Design space exploration for a coarse grain accelerator

Proceedings of the 2008 Asia and South Pacific Design Automation Conference
An architecture framework for an adaptive extensible processor

The Journal of Supercomputing
Transparent reconfigurable acceleration for heterogeneous embedded applications

Proceedings of the conference on Design, automation and test in Europe
Run-Time Adaptable Architectures for Heterogeneous Behavior Embedded Systems

ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
Compiling custom instructions onto expression-grained reconfigurable architectures

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
A combined analytical and simulation-based model for performance evaluation of a reconfigurable instruction set processor

Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Efficient resource utilization for an extensible processor through dynamic instruction set adaptation

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Improving Performance and Energy Saving in a Reconfigurable Processor via Accelerating Control Data Flow Graphs

IEICE - Transactions on Information and Systems
Improving performance of simple cores by exploiting loop-level parallelism through value prediction and reconfiguration

Proceedings of the 6th ACM conference on Computing frontiers
Runtime Adaptive Extensible Embedded Processors -- A Survey

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
CGRA express: accelerating execution using dynamic operation fusion

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Dynamically utilizing computation accelerators for extensible processors in a software approach

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Scalable multi-cores with improved per-core performance using off-the-critical path reconfigurable hardware

HiPC'08 Proceedings of the 15th international conference on High performance computing
Mighty-morphing power-SIMD

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
The Instruction-Set Extension Problem: A Survey

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Thread Warping: Dynamic and Transparent Synthesis of Thread Accelerators

ACM Transactions on Design Automation of Electronic Systems (TODAES)
CReAMS: an embedded multiprocessor platform

ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
Towards an adaptable multiple-ISA reconfigurable processor

ARC'11 Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applications
SoftHV: a HW/SW co-designed processor with horizontal and vertical fusion

Proceedings of the 8th ACM International Conference on Computing Frontiers
An integrated temporal partitioning and mapping framework for handling custom instructions on a reconfigurable functional unit

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Custom instruction generation using temporal partitioning techniques for a reconfigurable functional unit

EUC'06 Proceedings of the 2006 international conference on Embedded and Ubiquitous Computing
SIMD defragmenter: efficient ILP realization on data-parallel architectures

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Bundled execution of recurring traces for energy-efficient general purpose processing

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Improving performance and energy efficiency of embedded processors via post-fabrication instruction set customization

The Journal of Supercomputing
A Hardware/Software Cooperative Custom Register Binding Approach for Register Spill Elimination in Application-Specific Instruction Set Processors

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Energy efficient special instruction support in an embedded processor with compact isa

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Architecture support for custom instructions with memory operations

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Towards a multiple-ISA embedded system

Journal of Systems Architecture: the EUROMICRO Journal
Neural Acceleration for General-Purpose Approximate Programs

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A general constraint-centric scheduling framework for spatial architectures

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
An energy-efficient method of supporting flexible special instructions in an embedded processor with compact ISA

ACM Transactions on Architecture and Code Optimization (TACO)
A just-in-time customizable processor

Proceedings of the International Conference on Computer-Aided Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

Application-specific instruction set extensions are an effective way of improving the performance of processors. Critical computation subgraphs can be accelerated by collapsing them into new instructions that are executed on specialized function units. Collapsing the subgraphs simultaneously reduces the length of computation as well as the number of intermediate results stored in the register file. The main problem with this approach is that a new processor must be generated for each application domain. While new instructions can be designed automatically, there is a substantial amount of engineering cost incurred to verify and to implement the final custom processor. In this work, we propose a strategy to transparent customization of the core computation capabilities of the processor without changing its instruction set. A congurable array of function units is added to the baseline processor that enables the acceleration of a wide range of data flow subgraphs. To exploit the array, the microarchitecture performs subgraph identification at run-time, replacing them with new microcode instructions to configure and utilize the array. We compare the effectiveness of replacing subgraphs in the fill unit of a trace cache versus using a translation table during decode, and evaluate the tradeoffs between static and dynamic identification of subgraphs for instruction set customization.