MAO -- An extensible micro-architectural optimizer

Authors:
Robert Hundt;Easwaran Raman;Martin Thuresson;Neil Vachharajani
Affiliations:
Google 1600 Amphitheatre Parkway, Mountain View, CA, 94043;Google 1600 Amphitheatre Parkway, Mountain View, CA, 94043;Google 1600 Amphitheatre Parkway, Mountain View, CA, 94043;Google 1600 Amphitheatre Parkway, Mountain View, CA, 94043
Venue:
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Year:
2011

Citing 13
Cited 4

Combining loop transformations considering caches and scheduling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Nesting of reducible and irreducible loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
A Feasibility Study in Iterative Compilation

ISHPC '99 Proceedings of the Second International Symposium on High Performance Computing
OCEANS: Optimizing Compilers for Embedded Applications

Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Compiler optimization-space exploration

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Alto: a platform for object code modification

Alto: a platform for object code modification
Ispike: A Post-link Optimizer for the Intel®Itanium®Architecture

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Post-compilation optimization for multiple gains with pattern matching

ACM SIGPLAN Notices
Producing wrong data without doing anything obviously wrong!

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Blind Optimization for Exploiting Hardware Features

CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Taming hardware event samples for FDO compilation

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
RACEZ: a lightweight and non-invasive race detection tool for production applications

Proceedings of the 33rd International Conference on Software Engineering

Compiler techniques to improve dynamic branch prediction for indirect jump and call instructions

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Runtime adaptation: a case for reactive code alignment

Proceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Compiling for niceness: mitigating contention for QoS in warehouse scale computers

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Librando: transparent code randomization for just-in-time compilers

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security

Quantified Score

Hi-index	0.00

Visualization

Abstract

Performance matters, and so does repeatability and predictability. Today's processors' micro-architectures have become so complex as to now contain many undocumented, not understood, and even puzzling performance cliffs. Small changes in the instruction stream, such as the insertion of a single NOP instruction, can lead to significant performance deltas, with the effect of exposing compiler and performance optimization efforts to perceived unwanted randomness. This paper presents MAO, an extensible micro-architectural assembly to assembly optimizer, which seeks to address this problem for x86/64 processors. In essence, MAO is a thin wrapper around a common open source assembler infrastructure. It offers basic operations, such as creation or modification of instructions, simple data-flow analysis, and advanced infra-structure, such as loop recognition, and a repeated relaxation algorithm to compute instruction addresses and lengths. This infrastructure enables a plethora of passes for pattern matching, alignment specific optimizations, peep-holes, experiments (such as random insertion of NOPs), and fast prototyping of more sophisticated optimizations. MAO can be integrated into any compiler that emits assembly code, or can be used standalone. MAO can be used to discover micro-architectural details semi-automatically. Initial performance results are encouraging.