A comparison of empirical and model-driven optimization

Authors:
Kamen Yotov;Xiaoming Li;Gang Ren;Michael Cibulskis;Gerald DeJong;Maria Garzaran;David Padua;Keshav Pingali;Paul Stodghill;Peng Wu
Affiliations:
Cornell University;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;Cornell University;Cornell University;IBM T.J. Watson Research Center
Venue:
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Year:
2003

Citing 17
Cited 52

The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
MOB forms: a class of multilevel block algorithms for dense linear algebra operations

ICS '94 Proceedings of the 8th international conference on Supercomputing
(Pen)-ultimate tiling?

Integration, the VLSI Journal
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: applications to analyze and transform scientific programs

ICS '96 Proceedings of the 10th international conference on Supercomputing
Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Computer architecture (2nd ed.): a quantitative approach

Computer architecture (2nd ed.): a quantitative approach
Organizing matrices and matrix operations for paged memory systems

Communications of the ACM
SPL: a language and compiler for DSP algorithms

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Iteration Space Tiling for Memory Hierarchies

Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
A Feasibility Study in Iterative Compilation

ISHPC '99 Proceedings of the Second International Symposium on High Performance Computing
Estimating cache misses and locality using stack distances

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Automatic Analytical Modeling for the Estimation of Cache Misses

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Automatic Blocking of Nested Loops

Automatic Blocking of Nested Loops
Automatically Tuned Linear Algebra Software

Automatically Tuned Linear Algebra Software

Estimating cache misses and locality using stack distances

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A Dynamically Tuned Sorting Library

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A High-Performance SIMD Floating Point Unit for BlueGene/L: Architecture, Compilation, and Algorithm Design

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy

Proceedings of the international symposium on Code generation and optimization
Predicting Unroll Factors Using Supervised Classification

Proceedings of the international symposium on Code generation and optimization
Rating Compiler Optimizations for Automatic Performance Tuning

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Towards a Systematic, Pragmatic and Architecture-Aware Program Optimization Process for Complex Processors

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Main effects screening: a distributed continuous quality assurance process for monitoring performance degradation in evolving software systems

Proceedings of the 27th international conference on Software engineering
Computer Architecture: Challenges and Opportunities for the Next Decade

IEEE Micro
Statistical Models for Empirical Search-Based Performance Tuning

International Journal of High Performance Computing Applications
Distributed performance testing using statistical modeling

A-MOST '05 Proceedings of the 1st international workshop on Advances in model-based testing
Think globally, search locally

Proceedings of the 19th annual international conference on Supercomputing
Facilitating the search for compositions of program transformations

Proceedings of the 19th annual international conference on Supercomputing
Automatic Tuning Matrix Multiplication Performance on Graphics Hardware

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Using Machine Learning to Focus Iterative Optimization

Proceedings of the International Symposium on Code Generation and Optimization
Automatic tuning of whole applications using direct search and a performance-based transformation system

The Journal of Supercomputing
Online performance auditing: using hot optimizations without getting burned

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Self-adapting numerical software (SANS) effort

IBM Journal of Research and Development
In search of a program generator to implement generic transformations for high-performance computing

Science of Computer Programming - Special issue on the first MetaOCaml workshop 2004
An approach toward profit-driven optimization

ACM Transactions on Architecture and Code Optimization (TACO)
Method-specific dynamic compilation using logistic regression

Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Automatic performance model construction for the fast software exploration of new hardware designs

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Profitable loop fusion and tiling using model-driven empirical search

Proceedings of the 20th annual international conference on Supercomputing
Context-specific middleware specialization techniques for optimizing software product-line architectures

Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Automatic mapping of nested loops to FPGAS

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Rapidly Selecting Good Compiler Optimizations using Performance Counters

Proceedings of the International Symposium on Code Generation and Optimization
Reliable Effects Screening: A Distributed Continuous Quality Assurance Process for Monitoring Performance Degradation in Evolving Software Systems

IEEE Transactions on Software Engineering
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A component infrastructure for performance and power modeling of parallel scientific applications

Proceedings of the 2008 compFrame/HPC-GECO workshop on Component based high performance
Exploring the Optimization Space of Dense Linear Algebra Kernels

Languages and Compilers for Parallel Computing
Convergent Compilation Applied to Loop Unrolling

Transactions on High-Performance Embedded Architectures and Compilers I
A Framework for Exploring Optimization Properties

CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
PetaBricks: a language and compiler for algorithmic choice

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Automatic Feature Generation for Machine Learning Based Optimizing Compilation

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L

IBM Journal of Research and Development
Portable compiler optimisation across embedded programs and microarchitectures using machine learning

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing matrix multiplication with a classifier learning system

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
A language for the compact representation of multiple program versions

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Analytic models and empirical search: a hybrid approach to code optimization

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
A transactional memory with automatic performance tuning

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions

Journal of Parallel and Distributed Computing
Empirical performance-model driven data layout optimization

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Automatically tuning parallel and parallelized programs

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Extendable pattern-oriented optimization directives

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Predictive modeling in a polyhedral optimization space

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
POET: a scripting language for applying parameterized source-to-source program transformations

Software—Practice & Experience
Extendable pattern-oriented optimization directives

ACM Transactions on Architecture and Code Optimization (TACO)
Mitigating the compiler optimization phase-ordering problem using machine learning

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Layout-oblivious compiler optimization for matrix computations

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A catalog of stream processing optimizations

ACM Computing Surveys (CSUR)
Automatic feature generation for machine learning--based optimising compilation

ACM Transactions on Architecture and Code Optimization (TACO)
Experiences Developing the OpenUH Compiler and Runtime Infrastructure

International Journal of Parallel Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Empirical program optimizers estimate the values of key optimization parameters by generating different program versions and running them on the actual hardware to determine which values give the best performance. In contrast, conventional compilers use models of programs and machines to choose these parameters. It is widely believed that model-driven optimization does not compete with empirical optimization, but few quantitative comparisons have been done to date. To make such a comparison, we replaced the empirical optimization engine in ATLAS (a system for generating a dense numerical linear algebra library called the BLAS) with a model-driven optimization engine that used detailed models to estimate values for optimization parameters, and then measured the relative performance of the two systems on three different hardware platforms. Our experiments show that model-driven optimization can be surprisingly effective, and can generate code whose performance is comparable to that of code generated by empirical optimizers for the BLAS.