Efficient retargetable code generation using bottom-up tree pattern matching
Computer Languages
Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
BURG: fast optimal instruction selection and tree parsing
ACM SIGPLAN Notices
Engineering a simple, efficient code-generator generator
ACM Letters on Programming Languages and Systems (LOPLAS)
Beyond induction variables: detecting and classifying sequences using a demand-driven SSA form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Instruction selection using binate covering for code size optimization
ICCAD '95 Proceedings of the 1995 IEEE/ACM international conference on Computer-aided design
Optimal code selection in DAGs
Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Optimal Code Generation for Expression Trees
Journal of the ACM (JACM)
Graph-based code selection techniques for embedded processors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Communications of the ACM
Register allocation for irregular architectures
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Fast and flexible instruction selection with on-demand tree-parsing automata
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Optimal chain rule placement for instruction selection based on SSA graphs
SCOPES '07 Proceedingsof the 10th international workshop on Software & compilers for embedded systems
Nearly optimal register allocation with PBQP
JMLC'06 Proceedings of the 7th joint conference on Modular Programming Languages
Code generation for STA architecture
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Converting intermediate code to assembly code using declarative machine descriptions
CC'06 Proceedings of the 15th international conference on Compiler Construction
A simple recipe for concise mixed 0-1 linearizations
Operations Research Letters
Instruction selection by graph transformation
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
SSA-based register allocation with PBQP
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
Compiling for automatically generated instruction set extensions
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Intermediate representations in imperative compilers: A survey
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Instruction selection is a well-studied compiler phase that translates the compiler's intermediate representation of programs to a sequence of target-dependent machine instructions optimizing for various compiler objectives (e.g. speed and space). Most existing instruction selection techniques are limited to the scope of a single statement or a basic block and cannot cope with irregular instruction sets that are frequently found in embedded systems. We consider an optimal technique for instruction selection that uses Static Single Assignment (SSA) graphs as an intermediate representation of programs and employs the Partitioned Boolean Quadratic Problem (PBQP) for finding an optimal instruction selection. While existing approaches are limited to instruction patterns that can be expressed in a simple tree structure, we consider complex patterns producing multiple results at the same time including pre/post increment addressing modes, div-mod instructions, and SIMD extensions frequently found in embedded systems. Although both instruction selection on SSA-graphs and PBQP are known to be NP-complete, the problem can be solved efficiently - even for very large instances. Our approach has been implemented in LLVM for an embedded ARMv5 architecture. Extensive experiments show speedups of up to 57% on typical DSP kernels and up to 10% on SPECINT 2000 and MiBench benchmarks. All of the test programs could be compiled within less than half a minute using a heuristic PBQP solver that solves 99.83% of all instances optimally.