An empirical evaluation of high level transformations for embedded processors
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
VISTA: a system for interactive code improvement
Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
Array recovery and high-level transformations for DSP applications
ACM Transactions on Embedded Computing Systems (TECS)
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Cache Models for Iterative Compilation
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Better tiling and array contraction for compiling scientific programs
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Embedded processor design challenges
Finding effective optimization phase sequences
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Fast searches for effective optimization phase sequences
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Finding effective compilation sequences
Proceedings of the 2004 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Adaptive java optimisation using instance-based learning
Proceedings of the 18th annual international conference on Supercomputing
A Model-Based Framework: An Approach for Profit-Driven Optimization
Proceedings of the international symposium on Code generation and optimization
Optimizing general purpose compiler optimization
Proceedings of the 2nd conference on Computing frontiers
ACME: adaptive compilation made efficient
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Probabilistic source-level optimisation of embedded programs
LCTES '05 Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Fast and efficient searches for effective optimization-phase sequences
ACM Transactions on Architecture and Code Optimization (TACO)
Generating new general compiler optimization settings
Proceedings of the 19th annual international conference on Supercomputing
Automatic Selection of Compiler Options Using Non-parametric Inferential Statistics
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Reduction Transformations for Optimization Parameter Selection
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Exhaustive Optimization Phase Order Space Exploration
Proceedings of the International Symposium on Code Generation and Optimization
Fast and Effective Orchestration of Compiler Optimizations for Automatic Performance Tuning
Proceedings of the International Symposium on Code Generation and Optimization
In search of near-optimal optimization phase orderings
Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
A software-only compression system for trading-offs between performance and code size
SCOPES '05 Proceedings of the 2005 workshop on Software and compilers for embedded systems
A New Genetic Algorithm for Loop Tiling
The Journal of Supercomputing
An approach toward profit-driven optimization
ACM Transactions on Architecture and Code Optimization (TACO)
VISTA: VPO interactive system for tuning applications
ACM Transactions on Embedded Computing Systems (TECS)
Parameterized tiled loops for free
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Microarchitecture Sensitive Empirical Models for Compiler Optimizations
Proceedings of the International Symposium on Code Generation and Optimization
Evaluating Heuristic Optimization Phase Order Search Algorithms
Proceedings of the International Symposium on Code Generation and Optimization
Rapidly Selecting Good Compiler Optimizations using Performance Counters
Proceedings of the International Symposium on Code Generation and Optimization
Proceedings of the conference on Design, automation and test in Europe
PEAK—a fast and effective performance tuning system via compiler optimization orchestration
ACM Transactions on Programming Languages and Systems (TOPLAS)
Program optimization space pruning for a multithreaded gpu
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Fast cycle-approximate instruction set simulation
SCOPES '08 Proceedings of the 11th international workshop on Software & compilers for embedded systems
Multi-level tiling: M for the price of one
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Iterative optimization in the polyhedral model: part ii, multidimensional time
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Program optimization carving for GPU computing
Journal of Parallel and Distributed Computing
Positivity, posynomials and tile size selection
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Adaptive Loop Tiling for a Multi-cluster CMP
ICA3PP '08 Proceedings of the 8th international conference on Algorithms and Architectures for Parallel Processing
A tuning framework for software-managed memory hierarchies
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Practical exhaustive optimization phase order exploration and evaluation
ACM Transactions on Architecture and Code Optimization (TACO)
Automated transformation for performance-critical kernels
LCSD '07 Proceedings of the 2007 Symposium on Library-Centric Software Design
A Framework for Exploring Optimization Properties
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Tuning parallel applications in parallel
Parallel Computing
Power profile estimation and compiler-based software optimization for mobile devices
Journal of Embedded Computing - PATMOS 2007 selected papers on low power electronics
Constructing an optimisation phase using grammatical evolution
CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
Improving both the performance benefits and speed of optimization phase sequence searches
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
MiDataSets: creating the conditions for a more realistic evaluation of Iterative optimization
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Automatic creation of tile size selection models
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Software power peak reduction on smart card systems based on iterative compiling
EUC'07 Proceedings of the 2007 conference on Emerging direction in embedded and ubiquitous computing
An adaptive strategy for inline substitution
CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
Eliminating false phase interactions to reduce optimization phase order search space
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Collective optimization: A practical collaborative approach
ACM Transactions on Architecture and Code Optimization (TACO)
On the impact of data input sets on statistical compiler tuning
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Auto-tuning full applications: A case study
International Journal of High Performance Computing Applications
An evaluation of different modeling techniques for iterative compilation
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Using machine learning to improve automatic vectorization
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Journal of Parallel and Distributed Computing
Evaluating iterative compilation
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Loop transformation recipes for code generation and auto-tuning
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Effective source-to-source outlining to support whole program empirical optimization
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Iterative collective loop fusion
CC'06 Proceedings of the 15th international conference on Compiler Construction
Predictive modeling in a polyhedral optimization space
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Panacea: towards holistic optimization of MapReduce applications
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Portable section-level tuning of compiler parallelized applications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A multi-objective auto-tuning framework for parallel codes
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Finding good optimization sequences covering program space
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Software thread integration for instruction-level parallelism
ACM Transactions on Embedded Computing Systems (TECS)
Preliminary results for neuroevolutionary optimization phase order generation for static compilation
Proceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems
Hi-index | 0.00 |
Loop tiling and unrolling are two important program transformations to exploit locality and expose instruction level parallelism, respectively. In this paper, we address the problem of how to select tile sizes and unroll factors simultaneously. We approach this problem in an architecturally adaptive manner by means of iterative compilation, where we generate many versions of a program and decide upon the best by actually executing them and measuring their execution time. We evaluate several iterative strategies. We compare the levels of optimization obtained by iterative compilation to several well-known static techniques and show that we outperform each of them on a range of benchmarks across a variety of architectures. Finally, we show how to quantitatively trade-off the number of profiles needed and the level of optimization that can be reached.