Dynamic control of performance monitoring on large scale parallel systems
ICS '93 Proceedings of the 7th international conference on Supercomputing
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Improving online performance diagnosis by the use of historical performance data
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Proceedings of the 14th international conference on Supercomputing
HPCVIEW: A Tool for Top-down Analysis of Node Performance
The Journal of Supercomputing
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Deep Start: A Hybrid Strategy for Automated Performance Problem Searches
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A Callgraph-Based Search Strategy for Automated Performance Diagnosis (Distinguished Paper)
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
The SPMD Model: Past, Present and Future
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Scalable analysis techniques for microprocessor performance counter metrics
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
ICPPW '02 Proceedings of the 2002 International Conference on Parallel Processing Workshops
Automatic performance analysis of hybrid MPI/OpenMP applications
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Evolutions in parallel distributed and network-based processing
PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
Knowledge engineering for automatic parallel performance diagnosis: Research Articles
Concurrency and Computation: Practice & Experience - European–American Working Group on Automatic Performance Analysis (APART)
Automatic analysis of inefficiency patterns in parallel applications: Research Articles
Concurrency and Computation: Practice & Experience - European–American Working Group on Automatic Performance Analysis (APART)
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Knowledge support and automation for performance analysis with PerfExplorer 2.0
Scientific Programming - Large-Scale Programming Tools and Environments
Capturing performance knowledge for automated analysis
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Effective performance measurement and analysis of multithreaded applications
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Accurate Analytical Models for Message Passing on Multi-core Clusters
PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
A scalable auto-tuning framework for compiler optimization
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
In cloud, do MTC or HTC service providers benefit from the economies of scale?
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
HPCTOOLKIT: tools for performance analysis of optimized parallel programs http://hpctoolkit.org
Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
Towards automatic optimization of MapReduce programs
Proceedings of the 1st ACM symposium on Cloud computing
In Cloud, Can Scientific Communities Benefit from the Economies of Scale?
IEEE Transactions on Parallel and Distributed Systems
Soft computing approach to performance analysis of parallel and distributed programs
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Performance analysis and optimization of MPI collective operations on multi-core clusters
The Journal of Supercomputing
Hi-index | 0.00 |
Automatic performance debugging of parallel applications includes two main steps: locating performance bottlenecks and uncovering their root causes for performance optimization. Previous work fails to resolve this challenging issue in two ways: first, several previous efforts automate locating bottlenecks, but present results in a confined way that only identifies performance problems with a priori knowledge; second, several tools take exploratory or confirmatory data analysis to automatically discover relevant performance data relationships, but these efforts do not focus on locating performance bottlenecks or uncovering their root causes. The simple program and multiple data (SPMD) programming model is widely used for both high performance computing and Cloud computing. In this paper, we design and implement an innovative system, AutoAnalyzer, that automates the process of debugging performance problems of SPMD-style parallel programs, including data collection, performance behavior analysis, locating bottlenecks, and uncovering their root causes. AutoAnalyzer is unique in terms of two features: first, without any prior knowledge, it automatically locates bottlenecks and uncovers their root causes for performance optimization; second, it is lightweight in terms of the size of performance data to be collected and analyzed. Our contributions are three-fold: first, we propose two effective clustering algorithms to investigate the existence of performance bottlenecks that cause process behavior dissimilarity or code region behavior disparity, respectively; meanwhile, we present two searching algorithms to locate bottlenecks; second, on the basis of the rough set theory, we propose an innovative approach to automatically uncover root causes of bottlenecks; third, on the cluster systems with two different configurations, we use two production applications, written in Fortran 77, and one open source code-MPIBZIP2 (http://compression.ca/mpibzip2/), written in C++, to verify the effectiveness and correctness of our methods. For three applications, we also propose an experimental approach to investigating the effects of different metrics on locating bottlenecks.