Blind Optimization for Exploiting Hardware Features

Authors:
Dan Knights;Todd Mytkowicz;Peter F. Sweeney;Michael C. Mozer;Amer Diwan
Affiliations:
Department of Computer Science, University of Colorado, Boulder;Department of Computer Science, University of Colorado, Boulder;Department of Computer Science, University of Colorado, Boulder;Department of Computer Science, University of Colorado, Boulder;Department of Computer Science, University of Colorado, Boulder
Venue:
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Year:
2009

Citing 23
Cited 4

Superoptimizer: a look at the smallest program

ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Program optimization for instruction caches

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Profile guided code positioning

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Procedure merging with instruction caches

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Reducing branch costs via branch alignment

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Efficient procedure mapping using cache line coloring

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Procedure placement using temporal ordering information

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Code placement techniques for cache miss rate reduction

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Adaptive optimization in the Jalapeño JVM

OOPSLA '00 Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A scalable cross-platform infrastructure for application performance tuning using hardware counters

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Online feedback-directed optimization of Java

OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Building a Basic Block Instruction Scheduler with Reinforcement Learning and Rollouts

Machine Learning
Adaptive Optimizing Compilers for the 21st Century

The Journal of Supercomputing
Compiler optimization-space exploration

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Inducing heuristics to decide whether to schedule

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Vertical profiling: understanding the behavior of object-priented applications

OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Code placement for improving dynamic branch prediction accuracy

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Automatic Tuning of Inlining Heuristics

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Fast and Effective Orchestration of Compiler Optimizations for Automatic Performance Tuning

Proceedings of the International Symposium on Code Generation and Optimization
Online performance auditing: using hot optimizations without getting burned

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Understanding the behavior of compiler optimizations

Software—Practice & Experience - Research Articles
Intelligent selection of application-specific garbage collectors

Proceedings of the 6th international symposium on Memory management
Producing wrong data without doing anything obviously wrong!

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems

Hardware performance monitoring for the rest of us: a position and survey

NPC'11 Proceedings of the 8th IFIP international conference on Network and parallel computing
Compiler techniques to improve dynamic branch prediction for indirect jump and call instructions

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
MAO -- An extensible micro-architectural optimizer

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A proper performance evaluation system that summarizes code placement effects

Proceedings of the 11th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software systems typically exploit only a small fraction of the realizable performance from the underlying microprocessors. While there has been much work on hardware-aware optimizations, two factors limit their benefit. First, microprocessors are so complex that it is unlikely that even an aggressively optimizing compiler will be able to satisfy all the constraints necessary to obtain the best performance. Thus, most optimizations use a simplified model of the hardware (e.g., they may be cache-aware but they may ignore other hardware structures, such as TLBs, etc.). Second, hardware manufacturers do not reveal all details of their microprocessors so even if the authors of optimizations wanted to simultaneously optimize for all components of the hardware, they may be unable to do so because they are working with limited knowledge. This paper presents and evaluates our blind optimization approach which provides a way to get around these issues. Blind optimization uses the insight that we can generate many variants of an application by altering semantic preserving parameters of an application; for example our variants can cover the space of code and data layout by shifting the positions of code and data in memory. Our optimization strategy attempts to find a variant that performs well with respect to an optimization objective. We show that even our first implementation of blind optimization speeds up a number of programs from the SPECint 2006 benchmark suite.