A Portable Programming Interface for Performance Evaluation on Modern Processors
International Journal of High Performance Computing Applications
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
Scalable massively parallel I/O to task-local files
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
The Scalasca performance toolset architecture
Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Intel Xeon Phi coprocessors based on the Many Integrated Core (MIC) architecture are starting to appear in HPC systems, with Stampede being a prominent example available within the XSEDE cyber-infrastructure. Porting MPI and OpenMP applications to such systems is often no more than simple recompilation, however, execution performance needs to be carefully analyzed and tuned to effectively exploit their unique capabilities. For performance measurement and analysis tools, the variety of execution modes need to be supported in a consistent and convenient manner, and especially execution configurations involving large numbers of compute nodes each with several multicore host processors and many-core coprocessors. Early experience using the open-source Scalasca toolset for runtime summarization and automatic trace analysis with the NPB BT-MZ MPI+OpenMP parallel application on Stampede is reported, along with discussion of on-going and future work.