On performance analysis of a multithreaded application parallelized by different programming models using intel VTune

Authors:
Ami Marowka
Affiliations:
Department of Computer Science, College of Exact Sciences, Bar-Ilan University, Ramat Gan, Israel
Venue:
PaCT'11 Proceedings of the 11th international conference on Parallel computing technologies
Year:
2011

Citing 7
Cited 0

Execution characteristics of SPEC CPU2000 benchmarks: Intel C++ vs. Microsoft VC++

ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
Parallel computing on any desktop

Communications of the ACM - ACM's plan to go online first
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)

Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Intel threading building blocks

Intel threading building blocks
Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
The Cilk++ concurrency platform

Proceedings of the 46th Annual Design Automation Conference
Identifying Performance Bottlenecks in Work-Stealing Computations

Computer

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-core processors are ubiquitous. Extracting the desired performance from them requires efficient techniques for partitioning a single piece of work into multiple fine-grained units of work in order to process them simultaneously. Understanding the performance behavior of a parallel system requires a close familiarity with the underlying architecture and the hardware counters. We present a performance analysis study of a multi-core system by a state-of-the-art parallel performance analyzer tool, the Intel VTune Performance Analyzer. We chose as a test-case a classic nested-loop application that exhibits unexpected performance gains using two different programming models on the same multi-core system. Our expectations were to be able to reason about the performance results by exploring the application behavior using the parallel analyzer tool. We found that it is very difficult to explain high-level performance measurements of multi-core systems by low-level hardware diagnosis.