On-line automated performance diagnosis on thousands of processes

Authors:
Philip C. Roth;Barton P. Miller
Affiliations:
Oak Ridge National Laboratory, Oak Ridge, TN;University of Wisconsin, Madison, Madison, WI
Venue:
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2006

Citing 19
Cited 15

The performance of multiprogrammed multiprocessor scheduling algorithms

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Models for performance perturbation analysis

PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
Categories and context in scalable execution visualization

Journal of Parallel and Distributed Computing - Special issue on tools and methods for visualization of parallel systems and computations
The interaction of parallel and sequential workloads on a network of workstations

Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Algorithms for the Longest Common Subsequence Problem

Journal of the ACM (JACM)
The Byzantine Generals Problem

ACM Transactions on Programming Languages and Systems (TOPLAS)
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems

ACM Transactions on Computer Systems (TOCS)
Experiment management support for performance tuning

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
The Autopilot performance-directed adaptive control system

Future Generation Computer Systems - I. High Performance Numerical Methods and Applications. II. Performance Data Mining: Automated Diagnosis, Adaption, and Optimization
A Scalable Debugger for Massively Parallel Message-Passing Programs

IEEE Parallel & Distributed Technology: Systems & Technology
The Paradyn Parallel Performance Measurement Tool

Computer
An Adaptive Cost System for Parallel Program Instrumentation

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Reduction of Visual Complexity in Dynamic Graphs

GD '94 Proceedings of the DIMACS International Workshop on Graph Drawing
Initial Design of a Test Suite for Automatic Performance Analysis Tools

HIPS '03 Proceedings of the Eighth International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS'03)
Visualizing Program Executions on Large Data Sets

VL '96 Proceedings of the 1996 IEEE Symposium on Visual Languages
Specification of Performance Problems in MPI Programs with ASL

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A Portable Programming Interface for Performance Evaluation on Modern Processors

International Journal of High Performance Computing Applications
Automatic performance analysis tools for the Grid: Research Articles

Concurrency and Computation: Practice & Experience - Grid Performance

The Future of Software Performance Engineering

FOSE '07 2007 Future of Software Engineering
DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Automatic analysis of speedup of MPI applications

Proceedings of the 22nd annual international conference on Supercomputing
A regression-based approach to scalability prediction

Proceedings of the 22nd annual international conference on Supercomputing
Controlled dynamic performance analysis

WOSP '08 Proceedings of the 7th international workshop on Software and performance
Automated performance analysis using ASL performance properties

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Adaptive bug isolation

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Automatic Phase Detection and Structure Extraction of MPI Applications

International Journal of High Performance Computing Applications
Automatic performance analysis of large scale simulations

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Scalable parallel trace-based performance analysis

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
TAUg: runtime global performance data access using MPI

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Finding inefficiencies in OpenMP applications automatically with periscope

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
MATE: toward scalable automated and dynamic performance tuning environment

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
ADP: automated diagnosis of performance pathologies using hardware events

Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
TA UoverSupermon: low-overhead online parallel performance monitoring

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Performance analysis tools are critical for the effective use of large parallel computing resources, but existing tools have failed to address three problems that limit their scalability: (1) management and processing of the volume of performance data generated when monitoring a large number of application processes, (2) communication between a large number of tool components, and (3) presentation of performance data and analysis results for applications with a large number of processes. In this paper, we present a novel approach for finding performance problems in applications with a large number of processes that leverages our multicast and data aggregation infrastructure to address these three performance tool scalability barriers.First, we show how to design a scalable, distributed performance diagnosis facility. We demonstrate this design with an on-line, automated strategy for finding performance bottlenecks. Our strategy uses distributed, independent bottleneck search agents located in the tool agent processes that monitor running application processes. Second, we present a technique for constructing compact displays of the results of our bottleneck detection strategy. This technique, called the Sub-Graph Folding Algorithm, presents bottleneck search results using dynamic graphs that record the refinement of a bottleneck search. The complexity of the results graph is controlled by combining sub-graphs showing similar local application behavior into a composite sub-graph.Using an approach that combines these two synergistic parts, we performed bottleneck searches on programs with up to 1024 processes with no sign of tool resource saturation. With 1024 application processes, our visualization technique reduced a search results graph containing over 30,000 nodes to a single composite 44-node graph sub-graph showing the same qualitative performance information as the original graph.