The performance of multiprogrammed multiprocessor scheduling algorithms
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Models for performance perturbation analysis
PADD '91 Proceedings of the 1991 ACM/ONR workshop on Parallel and distributed debugging
Categories and context in scalable execution visualization
Journal of Parallel and Distributed Computing - Special issue on tools and methods for visualization of parallel systems and computations
The interaction of parallel and sequential workloads on a network of workstations
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Algorithms for the Longest Common Subsequence Problem
Journal of the ACM (JACM)
The Byzantine Generals Problem
ACM Transactions on Programming Languages and Systems (TOPLAS)
Implicit coscheduling: coordinated scheduling with implicit information in distributed systems
ACM Transactions on Computer Systems (TOCS)
Experiment management support for performance tuning
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
The Autopilot performance-directed adaptive control system
Future Generation Computer Systems - I. High Performance Numerical Methods and Applications. II. Performance Data Mining: Automated Diagnosis, Adaption, and Optimization
A Scalable Debugger for Massively Parallel Message-Passing Programs
IEEE Parallel & Distributed Technology: Systems & Technology
An Adaptive Cost System for Parallel Program Instrumentation
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Reduction of Visual Complexity in Dynamic Graphs
GD '94 Proceedings of the DIMACS International Workshop on Graph Drawing
Initial Design of a Test Suite for Automatic Performance Analysis Tools
HIPS '03 Proceedings of the Eighth International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS'03)
Visualizing Program Executions on Large Data Sets
VL '96 Proceedings of the 1996 IEEE Symposium on Visual Languages
Specification of Performance Problems in MPI Programs with ASL
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A Portable Programming Interface for Performance Evaluation on Modern Processors
International Journal of High Performance Computing Applications
Automatic performance analysis tools for the Grid: Research Articles
Concurrency and Computation: Practice & Experience - Grid Performance
The Future of Software Performance Engineering
FOSE '07 2007 Future of Software Engineering
DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Automatic analysis of speedup of MPI applications
Proceedings of the 22nd annual international conference on Supercomputing
A regression-based approach to scalability prediction
Proceedings of the 22nd annual international conference on Supercomputing
Controlled dynamic performance analysis
WOSP '08 Proceedings of the 7th international workshop on Software and performance
Automated performance analysis using ASL performance properties
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Automatic Phase Detection and Structure Extraction of MPI Applications
International Journal of High Performance Computing Applications
Automatic performance analysis of large scale simulations
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Scalable parallel trace-based performance analysis
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
TAUg: runtime global performance data access using MPI
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Finding inefficiencies in OpenMP applications automatically with periscope
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
MATE: toward scalable automated and dynamic performance tuning environment
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
ADP: automated diagnosis of performance pathologies using hardware events
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
TA UoverSupermon: low-overhead online parallel performance monitoring
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
Performance analysis tools are critical for the effective use of large parallel computing resources, but existing tools have failed to address three problems that limit their scalability: (1) management and processing of the volume of performance data generated when monitoring a large number of application processes, (2) communication between a large number of tool components, and (3) presentation of performance data and analysis results for applications with a large number of processes. In this paper, we present a novel approach for finding performance problems in applications with a large number of processes that leverages our multicast and data aggregation infrastructure to address these three performance tool scalability barriers.First, we show how to design a scalable, distributed performance diagnosis facility. We demonstrate this design with an on-line, automated strategy for finding performance bottlenecks. Our strategy uses distributed, independent bottleneck search agents located in the tool agent processes that monitor running application processes. Second, we present a technique for constructing compact displays of the results of our bottleneck detection strategy. This technique, called the Sub-Graph Folding Algorithm, presents bottleneck search results using dynamic graphs that record the refinement of a bottleneck search. The complexity of the results graph is controlled by combining sub-graphs showing similar local application behavior into a composite sub-graph.Using an approach that combines these two synergistic parts, we performed bottleneck searches on programs with up to 1024 processes with no sign of tool resource saturation. With 1024 application processes, our visualization technique reduced a search results graph containing over 30,000 nodes to a single composite 44-node graph sub-graph showing the same qualitative performance information as the original graph.