Design and Prototype of a Performance Tool Interface for OpenMP
The Journal of Supercomputing
The MPI Standard for Message Passing
HPCN Europe 1994 Proceedings of the nternational Conference and Exhibition on High-Performance Computing and Networking Volume II: Networking and Tools
An API for Runtime Code Patching
International Journal of High Performance Computing Applications
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
Performance Profiling for OpenMP Tasks
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Optimizing the exploitation of multicore processors and GPUs with OpenMP and OpenCL
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Productive cluster programming with OmpSs
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
A ROSE-Based OpenMP 3.0 research compiler supporting multiple runtime libraries
IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
How to reconcile event-based performance analysis with tasking in OpenMP
IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
Hi-index | 0.00 |
Parallelism has become more and more commonplace with the advent of the multicore processors. Although different parallel programming models have arisen to exploit the computing capabilities of such processors, developing applications that take benefit of these processors may not be easy. And what is worse, the performance achieved by the parallel version of the application may not be what the developer expected, as a result of a dubious utilization of the resources offered by the processor. We present in this paper a fruitful synergy of a shared memory parallel compiler and runtime, and a performance extraction library. The objective of this work is not only to reduce the performance analysis life-cycle when doing the parallelization of an application, but also to extend the analysis experience of the parallel application by incorporating data that is only known in the compiler and runtime side. Additionally we present performance results obtained with the execution of instrumented application and evaluate the overhead of the instrumentation.