Automatic performance debugging of SPMD-style parallel programs

Authors:
Xu Liu;Jianfeng Zhan;Kunlin Zhan;Weisong Shi;Lin Yuan;Dan Meng;Lei Wang
Affiliations:
Institute of Computing Technology, China Academy of Sciences, Beijing 100190, China and Department of Computer Science, Rice University, United States;Institute of Computing Technology, China Academy of Sciences, Beijing 100190, China;Graduate University of Chinese Academy of Sciences, China;Department of Computer Science, Wayne State University, United States;Institute of Computing Technology, China Academy of Sciences, Beijing 100190, China and Graduate University of Chinese Academy of Sciences, China;Institute of Computing Technology, China Academy of Sciences, Beijing 100190, China;Institute of Computing Technology, China Academy of Sciences, Beijing 100190, China
Venue:
Journal of Parallel and Distributed Computing
Year:
2011

Citing 30
Cited 0

Dynamic control of performance monitoring on large scale parallel systems

ICS '93 Proceedings of the 7th international conference on Supercomputing
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Improving online performance diagnosis by the use of historical performance data

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Performance analysis of distributed applications using automatic classification of communication inefficiencies

Proceedings of the 14th international conference on Supercomputing
HPCVIEW: A Tool for Top-down Analysis of Node Performance

The Journal of Supercomputing
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Deep Start: A Hybrid Strategy for Automated Performance Problem Searches

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A Callgraph-Based Search Strategy for Automated Performance Diagnosis (Distinguished Paper)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
The SPMD Model: Past, Present and Future

Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Scalable analysis techniques for microprocessor performance counter metrics

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A Statistical Approach for the Analysis of the Relation Between Low-Level Performance Information, the Code, and the Environment

ICPPW '02 Proceedings of the 2002 International Conference on Parallel Processing Workshops
Automatic performance analysis of hybrid MPI/OpenMP applications

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Evolutions in parallel distributed and network-based processing
A methodology towards automatic performance analysis of parallel applications

Parallel Computing
PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
The Tau Parallel Performance System

International Journal of High Performance Computing Applications
Knowledge engineering for automatic parallel performance diagnosis: Research Articles

Concurrency and Computation: Practice & Experience - European–American Working Group on Automatic Performance Analysis (APART)
Automatic analysis of inefficiency patterns in parallel applications: Research Articles

Concurrency and Computation: Practice & Experience - European–American Working Group on Automatic Performance Analysis (APART)
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Knowledge support and automation for performance analysis with PerfExplorer 2.0

Scientific Programming - Large-Scale Programming Tools and Environments
Capturing performance knowledge for automated analysis

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Effective performance measurement and analysis of multithreaded applications

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Accurate Analytical Models for Message Passing on Multi-core Clusters

PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
A scalable auto-tuning framework for compiler optimization

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
In cloud, do MTC or HTC service providers benefit from the economies of scale?

Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
HPCTOOLKIT: tools for performance analysis of optimized parallel programs http://hpctoolkit.org

Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
Towards automatic optimization of MapReduce programs

Proceedings of the 1st ACM symposium on Cloud computing
Transformer: A New Paradigm for Building Data-Parallel Programming Models

IEEE Micro
In Cloud, Can Scientific Communities Benefit from the Economies of Scale?

IEEE Transactions on Parallel and Distributed Systems
Soft computing approach to performance analysis of parallel and distributed programs

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Performance analysis and optimization of MPI collective operations on multi-core clusters

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic performance debugging of parallel applications includes two main steps: locating performance bottlenecks and uncovering their root causes for performance optimization. Previous work fails to resolve this challenging issue in two ways: first, several previous efforts automate locating bottlenecks, but present results in a confined way that only identifies performance problems with a priori knowledge; second, several tools take exploratory or confirmatory data analysis to automatically discover relevant performance data relationships, but these efforts do not focus on locating performance bottlenecks or uncovering their root causes. The simple program and multiple data (SPMD) programming model is widely used for both high performance computing and Cloud computing. In this paper, we design and implement an innovative system, AutoAnalyzer, that automates the process of debugging performance problems of SPMD-style parallel programs, including data collection, performance behavior analysis, locating bottlenecks, and uncovering their root causes. AutoAnalyzer is unique in terms of two features: first, without any prior knowledge, it automatically locates bottlenecks and uncovers their root causes for performance optimization; second, it is lightweight in terms of the size of performance data to be collected and analyzed. Our contributions are three-fold: first, we propose two effective clustering algorithms to investigate the existence of performance bottlenecks that cause process behavior dissimilarity or code region behavior disparity, respectively; meanwhile, we present two searching algorithms to locate bottlenecks; second, on the basis of the rough set theory, we propose an innovative approach to automatically uncover root causes of bottlenecks; third, on the cluster systems with two different configurations, we use two production applications, written in Fortran 77, and one open source code-MPIBZIP2 (http://compression.ca/mpibzip2/), written in C++, to verify the effectiveness and correctness of our methods. For three applications, we also propose an experimental approach to investigating the effects of different metrics on locating bottlenecks.