A case for machine learning to optimize multicore performance

  • Authors:
  • Archana Ganapathi;Kaushik Datta;Armando Fox;David Patterson

  • Affiliations:
  • University of California at Berkeley;University of California at Berkeley;University of California at Berkeley;University of California at Berkeley

  • Venue:
  • HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multicore architectures have become so complex and diverse that there is no obvious path to achieving good performance. Hundreds of code transformations, compiler flags, architectural features and optimization parameters result in a search space that can take many machinemonths to explore exhaustively. Inspired by successes in the systems community, we apply state-of-the-art machine learning techniques to explore this space more intelligently. On 7-point and 27-point stencil code, our technique takes about two hours to discover a configuration whose performance is within 1% of and up to 18% better than that achieved by a human expert. This factor of 2000 speedup over manual exploration of the auto-tuning parameter space enables us to explore optimizations that were previously off-limits. We believe the opportunity for using machine learning in multicore autotuning is even more promising than the successes to date in the systems literature.