Estimating parallel performance, a skeleton-based approach
Proceedings of the fourth international workshop on High-level parallel programming and applications
Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Scalable fine-grained call path tracing
Proceedings of the international conference on Supercomputing
Multi-scale analysis of large distributed computing systems
Proceedings of the third international workshop on Large-scale system and application performance
Bridging performance analysis tools and analytic performance modeling for HPC
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Guided performance analysis combining profile and trace tools
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
Reducing the overhead of direct application instrumentation using prior static analysis
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
A contention-aware performance model for HPC-based networks: a case study of the InfiniBand network
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Scaling performance tool MPI communicator management
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
How to reconcile event-based performance analysis with tasking in OpenMP
IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
Performance engineering of GemsFDTD computational electromagnetics solver
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I
Automatic performance analysis of OpenMP codes on a scalable shared memory system using periscope
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Further improving the scalability of the scalasca toolset
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Enhancing brainware productivity through a performance tuning workflow
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Efficient and validated simulation of crowds for an evacuation assistant
Computer Animation and Virtual Worlds
Scalable detection of MPI-2 remote memory access inefficiency patterns
International Journal of High Performance Computing Applications
Performance analysis techniques for task-based OpenMP applications
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Runtime optimisation approaches for a real-time evacuation assistant
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Concurrency and Computation: Practice & Experience
Towards an energy-aware scientific I/O interface
Computer Science - Research and Development
Determine energy-saving potential in wait-states of large-scale parallel programs
Computer Science - Research and Development
Pattern-independent detection of manual collectives in MPI programs
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
A multi-level monitoring framework for stream-based coordination programs
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Exact dependence analysis for increased communication overlap
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Strategies for real-time event reduction
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
A scalable infiniband network topology-aware performance analysis tool for MPI
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Early experiences with scientific applications on the IBM Blue Gene/Q supercomputer
IBM Journal of Research and Development
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Runtime message uniquification for accurate communication analysis on incomplete MPI event traces
Proceedings of the 20th European MPI Users' Group Meeting
Understanding the formation of wait states in applications with one-sided communication
Proceedings of the 20th European MPI Users' Group Meeting
Estimating parallel performance
Journal of Parallel and Distributed Computing
Using automated performance modeling to find scalability bugs in complex codes
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Multicore profiling for Erlang programs using percept2
Proceedings of the twelfth ACM SIGPLAN workshop on Erlang
Adapting system execution traces to support analysis of software system performance properties
Journal of Systems and Software
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Parallel computing for phase-field models
International Journal of High Performance Computing Applications
Parallel real time computation of large scale pedestrian evacuations
Advances in Engineering Software
Visualizing large-scale parallel communication traces using a particle animation technique
EuroVis '13 Proceedings of the 15th Eurographics Conference on Visualization
Hi-index | 0.00 |
Scalasca is a performance toolset that has been specifically designed to analyze parallel application execution behavior on large-scale systems with many thousands of processors. It offers an incremental performance-analysis procedure that integrates runtime summaries with in-depth studies of concurrent behavior via event tracing, adopting a strategy of successively refined measurement configurations. Distinctive features are its ability to identify wait states in applications with very large numbers of processes and to combine these with efficiently summarized local measurements. In this article, we review the current toolset architecture, emphasizing its scalable design and the role of the different components in transforming raw measurement data into knowledge of application execution behavior. The scalability and effectiveness of Scalasca are then surveyed from experience measuring and analyzing real-world applications on a range of computer systems. Copyright © 2010 John Wiley & Sons, Ltd.