Code generation using tree matching and dynamic programming
ACM Transactions on Programming Languages and Systems (TOPLAS)
Symbolic Boolean manipulation with ordered binary-decision diagrams
ACM Computing Surveys (CSUR)
Interlock collapsing ALU for increased instruction-level parallelism
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A high-performance microarchitecture with hardware-programmable functional units
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Instruction selection using binate covering for code size optimization
ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
Instruction selection for embedded DSPs with complex instructions
EURO-DAC '96/EURO-VHDL '96 Proceedings of the conference on European design automation
An Algorithm for Subgraph Isomorphism
Journal of the ACM (JACM)
CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit
Proceedings of the 27th annual international symposium on Computer architecture
An efficient heuristic approach to solve the unate covering problem
DATE '00 Proceedings of the conference on Design, automation and test in Europe
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Instruction generation and regularity extraction for reconfigurable processors
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Instruction generation for hybrid reconfigurable systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
High-Performance 3-1 Interlock Collapsing ALU's
IEEE Transactions on Computers
Compiler Optimizations for Adaptive EPIC Processors
EMSOFT '01 Proceedings of the First International Workshop on Embedded Software
Automatic application-specific instruction-set extensions under microarchitectural constraints
Proceedings of the 40th annual Design Automation Conference
Garp: a MIPS processor with a reconfigurable coprocessor
FCCM '97 Proceedings of the 5th IEEE Symposium on FPGA-Based Custom Computing Machines
Automatic generation of application specific processors
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Using Dynamic Binary Translation to Fuse Dependent Instructions
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Common subgraph isomorphism detection by backtracking search
Software—Practice & Experience
Dataflow Mini-Graphs: Amplifying Superscalar Capacity and Bandwidth
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
ISEGEN: Generation of High-Quality Instruction Set Extensions by Iterative Improvement
Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Static strands: safely collapsing dependence chains for increasing embedded power efficiency
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
An Architecture Framework for Transparent Instruction Set Customization in Embedded Processors
Proceedings of the 32nd annual international symposium on Computer Architecture
Automated Custom Instruction Generation for Domain-Specific Processor Acceleration
IEEE Transactions on Computers
Using minimal minterms to represent programmability
CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Exploring the design space of LUT-based transparent accelerators
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
Negative thinking in branch-and-bound: the case of unate covering
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Synthesis of application specific instruction sets
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Exploiting Narrow Accelerators with Data-Centric Subgraph Mapping
Proceedings of the International Symposium on Code Generation and Optimization
Polynomial-time subgraph enumeration for automated instruction set extension
Proceedings of the conference on Design, automation and test in Europe
A code-generator generator for multi-output instructions
CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Efficient ASIP design for configurable processors with fine-grained resource sharing
Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
Proceedings of the 45th annual Design Automation Conference
StageNetSlice: a reconfigurable microarchitecture building block for resilient CMP systems
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Compiling custom instructions onto expression-grained reconfigurable architectures
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Recurrence-aware instruction set selection for extensible embedded processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The StageNet fabric for constructing resilient multicore systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Fast enumeration of maximal valid subgraphs for custom-instruction identification
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Dynamically utilizing computation accelerators for extensible processors in a software approach
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Selecting profitable custom instructions for reconfigurable processors
Journal of Systems Architecture: the EUROMICRO Journal
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
The Instruction-Set Extension Problem: A Survey
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
An efficient algorithm for custom instruction enumeration
Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
Approximate graph clustering for program characterization
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
SIMD defragmenter: efficient ILP realization on data-parallel architectures
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Exact custom instruction enumeration for extensible processors
Integration, the VLSI Journal
Compiling for automatically generated instruction set extensions
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Considering the effect of process variations during the ISA extension design flow
Microprocessors & Microsystems
Hi-index | 0.00 |
Computer architects are constantly faced with the need to improve performance and increase the efficiency of computation in their designs. To this end, it is increasingly common to see acyclic com-putation accelerators appear in embedded processor designs. One major problem with adding accelerators to a design is that it is difficult to generate high-quality code utilizing them. Hand-written assembly code is typical, and if compiler support does exist, it is implemented using only greedy algorithms. In this work, we investigate more thorough techniques for compiling to processors with acyclic accelerators. Where as greedy solutions only explore one possible solution, the techniques presented in this paper explore the entire design space, when possible. Intelligent pruning methods are employed to ensure compilation is both tractable and scalable. Overall, our new compilation algorithms produce code that performs on average 10%, and up to 32% better than standard greedy methods. These algorithms also run in less than one second for more than 98% of basic blocks tested.