Scalable subgraph mapping for acyclic computation accelerators

Authors:
Nathan Clark;Amir Hormati;Scott Mahlke;Sami Yehia
Affiliations:
University of Michigan - Ann Arbor, MI;University of Michigan - Ann Arbor, MI;University of Michigan - Ann Arbor, MI;ARM Ltd., Cambridge, United Kingdom
Venue:
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Year:
2006

Citing 28
Cited 21

Code generation using tree matching and dynamic programming

ACM Transactions on Programming Languages and Systems (TOPLAS)
Symbolic Boolean manipulation with ordered binary-decision diagrams

ACM Computing Surveys (CSUR)
Interlock collapsing ALU for increased instruction-level parallelism

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A high-performance microarchitecture with hardware-programmable functional units

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Instruction selection using binate covering for code size optimization

ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
Instruction selection for embedded DSPs with complex instructions

EURO-DAC '96/EURO-VHDL '96 Proceedings of the conference on European design automation
An Algorithm for Subgraph Isomorphism

Journal of the ACM (JACM)
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit

Proceedings of the 27th annual international symposium on Computer architecture
An efficient heuristic approach to solve the unate covering problem

DATE '00 Proceedings of the conference on Design, automation and test in Europe
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Instruction generation and regularity extraction for reconfigurable processors

CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Instruction generation for hybrid reconfigurable systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
High-Performance 3-1 Interlock Collapsing ALU's

IEEE Transactions on Computers
Compiler Optimizations for Adaptive EPIC Processors

EMSOFT '01 Proceedings of the First International Workshop on Embedded Software
Automatic application-specific instruction-set extensions under microarchitectural constraints

Proceedings of the 40th annual Design Automation Conference
Garp: a MIPS processor with a reconfigurable coprocessor

FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Automatic generation of application specific processors

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Using Dynamic Binary Translation to Fuse Dependent Instructions

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Common subgraph isomorphism detection by backtracking search

Software—Practice & Experience
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
ISEGEN: Generation of High-Quality Instruction Set Extensions by Iterative Improvement

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Static strands: safely collapsing dependence chains for increasing embedded power efficiency

LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors

Proceedings of the 32nd annual international symposium on Computer Architecture
Automated Custom Instruction Generation for Domain-Specific Processor Acceleration

IEEE Transactions on Computers
Using minimal minterms to represent programmability

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Exploring the design space of LUT-based transparent accelerators

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Negative thinking in branch-and-bound: the case of unate covering

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Synthesis of application specific instruction sets

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping

Proceedings of the International Symposium on Code Generation and Optimization
Polynomial-time subgraph enumeration for automated instruction set extension

Proceedings of the conference on Design, automation and test in Europe
A code-generator generator for multi-output instructions

CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Efficient ASIP design for configurable processors with fine-grained resource sharing

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Customizing computation accelerators for extensible multi-issue processors with effective optimization techniques

Proceedings of the 45th annual Design Automation Conference
StageNetSlice: a reconfigurable microarchitecture building block for resilient CMP systems

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Compiling custom instructions onto expression-grained reconfigurable architectures

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Recurrence-aware instruction set selection for extensible embedded processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The StageNet fabric for constructing resilient multicore systems

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Fast enumeration of maximal valid subgraphs for custom-instruction identification

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Dynamically utilizing computation accelerators for extensible processors in a software approach

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Selecting profitable custom instructions for reconfigurable processors

Journal of Systems Architecture: the EUROMICRO Journal
Mighty-morphing power-SIMD

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
The Instruction-Set Extension Problem: A Survey

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
An efficient algorithm for custom instruction enumeration

Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
Approximate graph clustering for program characterization

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
SIMD defragmenter: efficient ILP realization on data-parallel architectures

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Exact custom instruction enumeration for extensible processors

Integration, the VLSI Journal
Compiling for automatically generated instruction set extensions

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Considering the effect of process variations during the ISA extension design flow

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computer architects are constantly faced with the need to improve performance and increase the efficiency of computation in their designs. To this end, it is increasingly common to see acyclic com-putation accelerators appear in embedded processor designs. One major problem with adding accelerators to a design is that it is difficult to generate high-quality code utilizing them. Hand-written assembly code is typical, and if compiler support does exist, it is implemented using only greedy algorithms. In this work, we investigate more thorough techniques for compiling to processors with acyclic accelerators. Where as greedy solutions only explore one possible solution, the techniques presented in this paper explore the entire design space, when possible. Intelligent pruning methods are employed to ensure compilation is both tractable and scalable. Overall, our new compilation algorithms produce code that performs on average 10%, and up to 32% better than standard greedy methods. These algorithms also run in less than one second for more than 98% of basic blocks tested.