Using graph-based program characterization for predictive modeling

  • Authors:
  • Eunjung Park;John Cavazos;Marco A. Alvarez

  • Affiliations:
  • University of Delaware;University of Delaware;Utah State University

  • Venue:
  • Proceedings of the Tenth International Symposium on Code Generation and Optimization
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Using machine learning has proven effective at choosing the right set of optimizations for a particular program. For machine learning techniques to be most effective, compiler writers have to develop expressive means of characterizing the program being optimized. The current state-of-the-art techniques for characterizing programs include using a fixed-length feature vector of either source code features extracted during compile time or performance counters collected when running the program. For the problem of identifying optimizations to apply, models constructed using performance counter characterizations of a program have been shown to outperform models constructed using source code features. However, collecting performance counters requires running the program multiple times, and this "dynamic" method of characterizing programs can be specific to inputs of the program. It would be preferable to have a method of characterizing programs that is as expressive as performance counter features, but that is "static" like source code features and therefore does not require running the program. In this paper, we introduce a novel way of characterizing programs using a graph-based characterization, which uses the program's intermediate representation and an adapted learning algorithm to predict good optimization sequences. To evaluate different characterization techniques, we focus on loop-intensive programs and construct prediction models that drive polyhedral optimizations, such as auto-parallelism and various loop transformation. We show that our graph-based characterization technique outperforms three current state-of-the-art characterization techniques found in the literature. By using the sequences predicted to be the best by our graph-based model, we achieved up to 73% of the speedup achievable in our search space for a particular platform, whereas we could only achieve up to 59% by other state-of-the-art techniques we evaluated.