Performance analysis and tuning of automatically parallelized OpenMP applications

Authors:
Dheya Mustafa;Aurangzeb Aurangzeb;Rudolf Eigenmann
Affiliations:
ECE, Purdue University, West Lafayette, IN;ECE, Purdue University, West Lafayette, IN;ECE, Purdue University, West Lafayette, IN
Venue:
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Year:
2011

Citing 13
Cited 1

Dynamic feedback: an effective technique for adaptive computing

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
On the Automatic Parallelization of the Perfect Benchmarks®

IEEE Transactions on Parallel and Distributed Systems
A fast Fourier transform compiler

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Parallel Programming with Polaris

Computer
Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy

Proceedings of the international symposium on Code generation and optimization
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

International Journal of Parallel Programming
PEAK—a fast and effective performance tuning system via compiler optimization orchestration

ACM Transactions on Programming Languages and Systems (TOPLAS)
Mapping parallelism to multi-cores: a machine learning based approach

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Cetus: A Source-to-Source Compiler Infrastructure for Multicores

Computer
OpenMPC: Extended OpenMP Programming and Tuning for GPUs

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Performance instrumentation and compiler optimizations for MPI/OpenMP applications

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Automatically tuning parallel and parallelized programs

LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing

Portable section-level tuning of compiler parallelized applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic parallelization combined with tuning techniques is an alternative to manual parallelization of sequential programs to exploit the increased computational power that current multi-core systems offer. Automatic parallelization concentrates on finding any possible parallelism in the program, whereas tuning systems help identifying efficient parallel code segments and serializing inefficient ones using runtime performance metrics. In this work we study the performance gap between automatic and hand parallel OpenMP applications and try to find whether this gap can be filled by compile-time techniques or it needs dynamic or user-interactive solutions. We implement an empirical tuning framework and propose an algorithm that partitions programs into sections and tunes each code section individually. Experiments show that tuned applications perform better than original serial programs in the worst case and sometimes outperform hand-parallel applications. Our work is one of the first approaches delivering an auto-parallelization system that guarantees performance improvements for nearly all programs; hence it eliminates the need for users to "experiment" with such tools in order to obtain the shortest runtime of their applications.