A balanced approach to application performance tuning

  • Authors:
  • Souad Koliai;Stéphane Zuckerman;Emmanuel Oseret;Mickaël Ivascot;Tipp Moseley;Dinh Quang;William Jalby

  • Affiliations:
  • University of Versailles Saint-Quentin-en-Yvelines, France;University of Versailles Saint-Quentin-en-Yvelines, France;University of Versailles Saint-Quentin-en-Yvelines, France;University of Versailles Saint-Quentin-en-Yvelines, France;University of Versailles Saint-Quentin-en-Yvelines, France;Dassault-Aviation, France;University of Versailles Saint-Quentin-en-Yvelines, France

  • Venue:
  • LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Current hardware trends place increasing pressure on programmers and tools to optimize scientific code. Numerous tools and techniques exist, but no single tool is a panacea; instead, different tools have different strengths. Therefore, an assortment of performance tuning utilities and strategies are necessary to best utilize scarce resources (e.g., bandwidth, functional units, cache). This paper describes a combined methodology for the optimization process. The strategy combines static assembly analysis using MAQAO with dynamic information from hardware performance monitoring (HPM) and memory traces. We introduce a new technique, decremental analysis (DECAN), to iteratively identify the individual instructions responsible for performance bottlenecks. We present case studies on applications from several independent software vendors (ISVs) on a SMP Xeon Core 2 platform. These strategies help discover problems related to memory access locality and loop unrolling that lead to a sequential performance improvement of a factor of 2.