Instruction level power analysis and optimization of software
Journal of VLSI Signal Processing Systems - Special issue on technologies for wireless computing
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Retargetable compilation for low power
Proceedings of the ninth international symposium on Hardware/software codesign
High-level adaptive program optimization with ADAPT
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Compiler optimization on instruction scheduling for low power
ISSS '00 Proceedings of the 13th international symposium on System synthesis
On achieving balanced power consumption in software pipelined loops
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Better tiling and array contraction for compiling scientific programs
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
ECO: An Empirical-Based Compilation and Optimization System
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Instruction Scheduling for Low Power
Journal of VLSI Signal Processing Systems
Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy
Proceedings of the international symposium on Code generation and optimization
Power prediction for intel XScale® processors using performance monitoring unit events
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Fast, automatic, procedure-level performance tuning
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Profitable loop fusion and tiling using model-driven empirical search
Proceedings of the 20th annual international conference on Supercomputing
Automated transformation for performance-critical kernels
LCSD '07 Proceedings of the 2007 Symposium on Library-Centric Software Design
Performance and power aware CMP thread allocation modeling
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Understanding stencil code performance on multicore architectures
Proceedings of the 8th ACM International Conference on Computing Frontiers
Automated programmable control and parameterization of compiler optimizations
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Proceedings of the 9th conference on Computing Frontiers
POET: a scripting language for applying parameterized source-to-source program transformations
Software—Practice & Experience
Auto-tuning for energy usage in scientific applications
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Towards fully automatic auto-tuning: Leveraging language features of Chapel
International Journal of High Performance Computing Applications
Hi-index | 0.00 |
Automatic empirical tuning of compiler optimizations has been widely used to achieve portable high performance for scientific applications. However, as power dissipation becomes increasingly important in modern architecture design, few have attempted to empirically tune optimization configurations to reduce the power consumption of applications. We provide an automated empirical tuning framework that can be configured to optimize for both performance and energy efficiency. In particular, we extensively parameterize the configuration of a large number of compiler optimizations, including loop parallelization, blocking, unroll-and-jam, array copying, scalar replacement, strength reduction, and loop unrolling. We then use hardware counters combined with elapsed time to estimate both the performance and the power consumption of differently optimized code to automatically discover desirable configurations for these optimizations. We use a power meter to verify our tuning results on two multi-core computers and show that our approach can effectively achieve a balanced performance and energy efficiency on modern CMP machines.