To infinity and beyond?! on scaling performance measurement and analysis tools for parallel programming

Authors:
Bernd Mohr
Affiliations:
Forschungszentrum Jülich, John-von-Neumann Institute for Computing, Virtual Institute for High-Productivity Supercomputing, Jülich, Germany
Venue:
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Year:
2007

Citing 3
Cited 0

High Performance Event Trace Visualization

PDP '05 Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing
The Tau Parallel Performance System

International Journal of High Performance Computing Applications
Scalable parallel trace-based performance analysis

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

The number of processor cores available in high-performance computing systems is steadily increasing. A major factor is the current trend to use multi-core and many-core processor chip architectures. In the latest list of the TOP500 Supercomputer Sites[1], 63% of the systems listed have more than 1024 processor cores and the average is about 2400. While this promises ever more compute power and memory capacity to tackle today's complex simulation problems, it forces application developers to greatly enhance the scalability of their codes to be able to exploit it. This often requires new algorithms, methods or parallelization schemes to be developed as many well-known and accepted techniques stop working at these large scales. It starts with simple things like opening a file per process to save checkpoint information, or collecting simulation results of the whole program via a gather operation on a single process, or previously unimportant order O(n2)-type operations which quickly dominate the execution. Unfortunately many of these performance problems only show up when executing with very high numbers of processes and cannot be easily diagnosed or predicted from measurements at lower numbers. Detecting and diagnosing these performance and scalability bottlenecks requires sophisticated performance instrumentation, measurement and analysis tools. Simple tools typically scale very well but the information they provide proves to be less and less useful at these high scales. It is clear that tool developers face exactly the same problems as application developers when enhancing their tools to handle and support highly scalable applications. In this talk we discuss the major limitations of currently used state-of-the-art performance measurement, analysis and visualisation methods and tools. We give an overview about experiments, new approaches and first results of performance tool projects which try to overcome these limits. This includes new scalable and enhanced result visualization methods used in the performance analysis framework TAU[2], methods to automatically extract key execution phases from long traces used by the Paraver toolset[3], more scalable client/server tool architecture like the one of VampirServer[4] for scalable timeline visualisations, and highly-parallel automatic performance bottleneck searches utilized by the Scalasca toolset[5].