A balanced approach to application performance tuning

Authors:
Souad Koliai;Stéphane Zuckerman;Emmanuel Oseret;Mickaël Ivascot;Tipp Moseley;Dinh Quang;William Jalby
Affiliations:
University of Versailles Saint-Quentin-en-Yvelines, France;University of Versailles Saint-Quentin-en-Yvelines, France;University of Versailles Saint-Quentin-en-Yvelines, France;University of Versailles Saint-Quentin-en-Yvelines, France;University of Versailles Saint-Quentin-en-Yvelines, France;Dassault-Aviation, France;University of Versailles Saint-Quentin-en-Yvelines, France
Venue:
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Year:
2009

Citing 9
Cited 1

A knowledge discovery methodology for the performance evaluation of scientific software

Neural, Parallel & Scientific Computations
A methodology for scientific benchmarking with large-scale applications

Performance evaluation and benchmarking with realistic applications
An efficient static analysis algorithm to detect redundant memory operations

Proceedings of the 2002 workshop on Memory system performance
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
The Tau Parallel Performance System

International Journal of High Performance Computing Applications
Identifying potential parallelism via loop-centric profiling

Proceedings of the 4th international conference on Computing frontiers
Effective performance measurement and analysis of multithreaded applications

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming

Simsys: a performance simulation framework

Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools

Quantified Score

Hi-index	0.00

Visualization

Abstract

Current hardware trends place increasing pressure on programmers and tools to optimize scientific code. Numerous tools and techniques exist, but no single tool is a panacea; instead, different tools have different strengths. Therefore, an assortment of performance tuning utilities and strategies are necessary to best utilize scarce resources (e.g., bandwidth, functional units, cache). This paper describes a combined methodology for the optimization process. The strategy combines static assembly analysis using MAQAO with dynamic information from hardware performance monitoring (HPM) and memory traces. We introduce a new technique, decremental analysis (DECAN), to iteratively identify the individual instructions responsible for performance bottlenecks. We present case studies on applications from several independent software vendors (ISVs) on a SMP Xeon Core 2 platform. These strategies help discover problems related to memory access locality and loop unrolling that lead to a sequential performance improvement of a factor of 2.