Dynamic feedback: an effective technique for adaptive computing
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
On the Automatic Parallelization of the Perfect Benchmarks®
IEEE Transactions on Parallel and Distributed Systems
A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Parallel Programming with Polaris
Computer
Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy
Proceedings of the international symposium on Code generation and optimization
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies
International Journal of Parallel Programming
PEAK—a fast and effective performance tuning system via compiler optimization orchestration
ACM Transactions on Programming Languages and Systems (TOPLAS)
Mapping parallelism to multi-cores: a machine learning based approach
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
OpenMPC: Extended OpenMP Programming and Tuning for GPUs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Performance instrumentation and compiler optimizations for MPI/OpenMP applications
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Automatically tuning parallel and parallelized programs
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Portable section-level tuning of compiler parallelized applications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Automatic parallelization combined with tuning techniques is an alternative to manual parallelization of sequential programs to exploit the increased computational power that current multi-core systems offer. Automatic parallelization concentrates on finding any possible parallelism in the program, whereas tuning systems help identifying efficient parallel code segments and serializing inefficient ones using runtime performance metrics. In this work we study the performance gap between automatic and hand parallel OpenMP applications and try to find whether this gap can be filled by compile-time techniques or it needs dynamic or user-interactive solutions. We implement an empirical tuning framework and propose an algorithm that partitions programs into sections and tunes each code section individually. Experiments show that tuned applications perform better than original serial programs in the worst case and sometimes outperform hand-parallel applications. Our work is one of the first approaches delivering an auto-parallelization system that guarantees performance improvements for nearly all programs; hence it eliminates the need for users to "experiment" with such tools in order to obtain the shortest runtime of their applications.