Towards a Systematic, Pragmatic and Architecture-Aware Program Optimization Process for Complex Processors

Authors:
David Parello;Olivier Temam;Albert Cohen;Jean-Marie Verdun
Affiliations:
HP France and HiPEAC network;INRIA Futurs, Paris Sud University and HiPEAC network;INRIA Futurs and HiPEAC network;HP France
Venue:
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Year:
2004

Citing 11
Cited 11

A quantitative analysis of loop nest locality

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Adaptive Optimizing Compilers for the 21st Century

The Journal of Supercomputing
On increasing architecture awareness in program optimizations to bridge the gap between peak and sustained processor performance: matrix-multiply revisited

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Continuous program optimization: A case study

ACM Transactions on Programming Languages and Systems (TOPLAS)
DiST: a simple, reliable and scalable method to significantly reduce processor architecture simulation time

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Using SimPoint for accurate and efficient simulation

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A comparison of empirical and model-driven optimization

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Meta optimization: improving compiler heuristics with machine learning

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling

Proceedings of the 30th annual international symposium on Computer architecture
Evaluating iterative compilation

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing

Facilitating the search for compositions of program transformations

Proceedings of the 19th annual international conference on Supercomputing
In search of a program generator to implement generic transformations for high-performance computing

Science of Computer Programming - Special issue on the first MetaOCaml workshop 2004
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
Automatic performance model construction for the fast software exploration of new hardware designs

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Rapidly Selecting Good Compiler Optimizations using Performance Counters

Proceedings of the International Symposium on Code Generation and Optimization
Quick and Practical Run-Time Evaluation of Multiple Program Optimizations

Transactions on High-Performance Embedded Architectures and Compilers I
Practical aggregation of semantical program properties for machine learning based optimization

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
A language for the compact representation of multiple program versions

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
A practical method for quickly evaluating program optimizations

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Predictive modeling in a polyhedral optimization space

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Using graph-based program characterization for predictive modeling

Proceedings of the Tenth International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Because processor architectures are increasingly complex, it is increasingly difficult to embed accurate machine models within compilers. As a result, compiler efficiency tends to decrease. Currently, the trend is on top-down approaches: static compilers are progressively augmented with information from the architecture as in profile-based, iterative or dynamic compilation techniques. However, for the moment, fairly elementary architectural information is used. In this article, we adopt a bottom-up approach to the architecture complexity issue: we assume we know everything about the behavior of the program on the architecture. We present a manual but systematic process for optimizing a program on a complex processor architecture using extensive dynamic analysis, and we find that a small set of run-time information is sufficient to drive anefficient process. We have experimentally observed on an Alpha 21264 that this approach can yield significant performance improvement on Spec benchmarks, beyond peak Spec. We are currently using this approach for optimizing customer applications.