Taming hardware event samples for FDO compilation

  • Authors:
  • Dehao Chen;Neil Vachharajani;Robert Hundt;Shih-wei Liao;Vinodha Ramasamy;Paul Yuan;Wenguang Chen;Weimin Zheng

  • Affiliations:
  • Tsinghua University, Beijing, China;Google, Mountain View, CA, USA;Google, Mountain View, CA, USA;Google, Mountain View, CA, USA;-;Peking University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China

  • Venue:
  • Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Feedback-directed optimization (FDO) is effective in improving application runtime performance, but has not been widely adopted due to the tedious dual-compilation model, the difficulties in generating representative training data sets, and the high runtime overhead of profile collection. The use of hardware-event sampling to generate estimated edge profiles overcomes these drawbacks. Yet, hardware event samples are typically not precise at the instruction or basic-block granularity. These inaccuracies lead to missed performance when compared to instrumentation-based FDO@. In this paper, we use multiple hardware event profiles and supervised learning techniques to generate heuristics for improved precision of basic-block-level sample profiles, and to further improve the smoothing algorithms used to construct edge profiles. We demonstrate that sampling-based FDO can achieve an average of 78% of the performance gains obtained using instrumentation-based exact edge profiles for SPEC2000 benchmarks, matching or beating instrumentation-based FDO in many cases. The overhead of collection is only 0.74% on average, while compiler based instrumentation incurs 6.8%-53.5% overhead (and 10x overhead on an industrial web search application), and dynamic instrumentation incurs 28.6%-1639.2% overhead.