Journal of Parallel and Distributed Computing - Special issue on tools and methods for visualization of parallel systems and computations
Parallel performance prediction using lost cycles analysis
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Active harmony: towards automated performance tuning
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Code Generation in the Polyhedral Model Is Easier Than You Think
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Automated Cluster-Based Web Service Performance Tuning
HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
An API for Runtime Code Patching
International Journal of High Performance Computing Applications
The NCAR Spectral Element Climate Dynamical Core: Semi-Implicit Eulerian Formulation
Journal of Scientific Computing
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
Performance and environment monitoring for continuous program optimization
IBM Journal of Research and Development
libMesh: a C++ library for parallel adaptive mesh refinement/coarsening simulations
Engineering with Computers
A Productivity Centered Tools Framework for Application Performance Tuning
QEST '07 Proceedings of the Fourth International Conference on Quantitative Evaluation of Systems
Scalable adaptive mantle convection simulation on petascale supercomputers
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
A Holistic Approach towards Automated Performance Analysis and Tuning
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Practical differential profiling
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Poster: determining code segments that can benefit from execution on GPUs
Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
ADP: automated diagnosis of performance pathologies using hardware events
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
A systematic process for efficient execution on Intel's heterogeneous computation nodes
Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Towards an energy-aware scientific I/O interface
Computer Science - Research and Development
Refactoring and automated performance tuning of computational chemistry application codes
Proceedings of the Winter Simulation Conference
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Quantifying performance bottleneck cost through differential analysis
Proceedings of the 27th international ACM conference on International conference on supercomputing
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Comprehensive job level resource usage measurement and analysis for XSEDE HPC systems
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
National Center for Genome Analysis support leverages XSEDE to support life science research
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Enabling comprehensive data-driven system management for large computational facilities
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Detection of false sharing using machine learning
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Framework for a productive performance optimization
Parallel Computing
Hi-index | 0.00 |
HPC systems are notorious for operating at a small fraction of their peak performance, and the ongoing migration to multi-core and multi-socket compute nodes further complicates performance optimization. The readily available performance evaluation tools require considerable effort to learn and utilize. Hence, most HPC application writers do not use them. As remedy, we have developed PerfExpert, a tool that combines a simple user interface with a sophisticated analysis engine to detect probable core, socket, and node-level performance bottlenecks in each important procedure and loop of an application. For each bottle-neck, PerfExpert provides a concise performance assessment and suggests steps that can be taken by the programmer to improve performance. These steps include compiler switches and optimization strategies with code examples. We have applied PerfExpert to several HPC production codes on the Ranger supercomputer. In all cases, it correctly identified the critical code sections and provided accurate assessments of their performance.