An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Authors:
Nathan Clark;Jason Blome;Michael Chu;Scott Mahlke;Stuart Biles;Krisztian Flautner
Affiliations:
University of Michigan - Ann Arbor;University of Michigan - Ann Arbor;University of Michigan - Ann Arbor;University of Michigan - Ann Arbor;ARM, Ltd.;ARM, Ltd.
Venue:
Proceedings of the 32nd annual international symposium on Computer Architecture
Year:
2005

Citing 23
Cited 43

Code generation using tree matching and dynamic programming

ACM Transactions on Programming Languages and Systems (TOPLAS)
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Instruction selection using binate covering for code size optimization

ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
The performance potential of data dependence speculation & collapsing

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Putting the fill unit to work: dynamic optimizations for trace cache microprocessors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
rePLay: A Hardware Framework for Dynamic Optimization

IEEE Transactions on Computers
Code Generation for Embedded Processors

Code Generation for Embedded Processors
Instruction generation and regularity extraction for reconfigurable processors

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
High-Performance 3-1 Interlock Collapsing ALU's

IEEE Transactions on Computers
Synthesis of custom processors based on extensible platforms

Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Automatic application-specific instruction-set extensions under microarchitectural constraints

Proceedings of the 40th annual Design Automation Conference
Instruction Pre-Processing in Trace Processors

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
DISE: a programmable macro engine for customizing applications

Proceedings of the 30th annual international symposium on Computer architecture
Code generation and optimization for embedded digital signal processors

Code generation and optimization for embedded digital signal processors
Automatic generation of application specific processors

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Processor Acceleration Through Automated Instruction Set Customization

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Characterizing embedded applications for instruction-set extensible processors

Proceedings of the 41st annual Design Automation Conference
From Sequences of Dependent Instructions to Functions: An Approach for Improving Performance without ILP or Speculation

Proceedings of the 31st annual international symposium on Computer architecture
Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Application-Specific Processing on a General-Purpose Core via Transparent Instruction Set Customization

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture

Exploring the design space of LUT-based transparent accelerators

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
A new idiom recognition framework for exploiting hardware-assist instructions

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Scalable subgraph mapping for acyclic computation accelerators

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Code transformation strategies for extensible embedded processors

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping

Proceedings of the International Symposium on Code Generation and Optimization
Binary synthesis

ACM Transactions on Design Automation of Electronic Systems (TODAES)
A Novel Application-specific Instruction-set Processor Design Approach for Video Processing Acceleration

Journal of VLSI Signal Processing Systems
Interactive presentation: Generating and executing multi-exit custom instructions for an adaptive extensible processor

Proceedings of the conference on Design, automation and test in Europe
Improving instruction level parallelism through reconfigurable units in superscalar processors

ACM SIGARCH Computer Architecture News - Special issue on the 2006 reconfigurable and adaptive architecture workshop
VEAL: Virtualized Execution Accelerator for Loops

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Enhancing energy efficiency of processor-based embedded systems through post-fabrication ISA extension

Proceedings of the 13th international symposium on Low power electronics and design
An architecture framework for an adaptive extensible processor

The Journal of Supercomputing
Handling Control Data Flow Graphs for a Tightly Coupled Reconfigurable Accelerator

ICESS '07 Proceedings of the 3rd international conference on Embedded Software and Systems
ARISE Machines: Extending Processors with Hybrid Accelerators

ARC '08 Proceedings of the 4th international workshop on Reconfigurable Computing: Architectures, Tools and Applications
Rapid design of area-efficient custom instructions for reconfigurable embedded processing

Journal of Systems Architecture: the EUROMICRO Journal
Energy- and area-efficient architectures through application clustering and architectural heterogeneity

ACM Transactions on Architecture and Code Optimization (TACO)
Improving performance of simple cores by exploiting loop-level parallelism through value prediction and reconfiguration

Proceedings of the 6th ACM conference on Computing frontiers
AnySP: anytime anywhere anyway signal processing

Proceedings of the 36th annual international symposium on Computer architecture
An Application Development Framework for ARISE Reconfigurable Processors

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Runtime Adaptive Extensible Embedded Processors -- A Survey

SAMOS '09 Proceedings of the 9th International Workshop on Embedded Computer Systems: Architectures, Modeling, and Simulation
CGRA express: accelerating execution using dynamic operation fusion

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Dynamically utilizing computation accelerators for extensible processors in a software approach

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Modern development methods and tools for embedded reconfigurable systems: A survey

Integration, the VLSI Journal
Conservation cores: reducing the energy of mature computations

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
A novel multi-objective instruction synthesis flow for application-specific instruction set processors

Proceedings of the 20th symposium on Great lakes symposium on VLSI
Scalable multi-cores with improved per-core performance using off-the-critical path reconfigurable hardware

HiPC'08 Proceedings of the 15th international conference on High performance computing
Resource sharing of pipelined custom hardware extension for energy-efficient application-specific instruction set processor design

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Exploring custom instruction synthesis for application-specific instruction set processors with multiple design objectives

Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
KAHRISMA: a novel hypermorphic reconfigurable-instruction-set multi-grained-array architecture

Proceedings of the Conference on Design, Automation and Test in Europe
Mighty-morphing power-SIMD

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
The Instruction-Set Extension Problem: A Survey

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Scientific Application Demands on a Reconfigurable Functional Unit Interface

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
The ARISE approach for extending embedded processors with arbitrary hardware accelerators

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
SIMD defragmenter: efficient ILP realization on data-parallel architectures

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
QsCores: trading dark silicon for scalable energy efficiency with quasi-specific cores

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Improving performance and energy efficiency of embedded processors via post-fabrication instruction set customization

The Journal of Supercomputing
Mixing static and dynamic strategies for high performance and low area reconfigurable systems

International Journal of High Performance Systems Architecture
Resource Sharing of Pipelined Custom Hardware Extension for Energy-Efficient Application-Specific Instruction Set Processor Design

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Energy efficient special instruction support in an embedded processor with compact isa

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Architecture for transparent binary acceleration of loops with memory accesses

ARC'13 Proceedings of the 9th international conference on Reconfigurable Computing: architectures, tools, and applications
An energy-efficient method of supporting flexible special instructions in an embedded processor with compact ISA

ACM Transactions on Architecture and Code Optimization (TACO)
Idiom recognition framework using topological embedding

ACM Transactions on Architecture and Code Optimization (TACO)
A just-in-time customizable processor

Proceedings of the International Conference on Computer-Aided Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

Instruction set customization is an effective way to improve processor performance. Critical portions of applicationdata-flow graphs are collapsed for accelerated execution on specialized hardware. Collapsing dataflow subgraphs will compress the latency along critical paths and reduces the number of intermediate results stored in the register file. While custom instructions can be effective, the time and cost of designing a new processor for each application is immense. To overcome this roadblock, this paper proposes a flexible architectural framework to transparently integrate custom instructions into a general-purpose processor. Hardware accelerators are added to the processor to execute the collapsed subgraphs. A simple microarchitectural interface is provided to support a plug-and-play model for integrating a wide range of accelerators into a pre-designed and verified processor core. The accelerators are exploited using an approach of static identification and dynamic realization. The compiler is responsible for identifying profitable subgraphs, while the hardware handles discovery, mapping, and execution of compatible subgraphs. This paper presents the design of a plug-and-play transparent accelerator system and evaluates the cost/performance implications of the design.