Automatic performance debugging of SPMD-style parallel programs

  • Authors:
  • Xu Liu;Jianfeng Zhan;Kunlin Zhan;Weisong Shi;Lin Yuan;Dan Meng;Lei Wang

  • Affiliations:
  • Institute of Computing Technology, China Academy of Sciences, Beijing 100190, China and Department of Computer Science, Rice University, United States;Institute of Computing Technology, China Academy of Sciences, Beijing 100190, China;Graduate University of Chinese Academy of Sciences, China;Department of Computer Science, Wayne State University, United States;Institute of Computing Technology, China Academy of Sciences, Beijing 100190, China and Graduate University of Chinese Academy of Sciences, China;Institute of Computing Technology, China Academy of Sciences, Beijing 100190, China;Institute of Computing Technology, China Academy of Sciences, Beijing 100190, China

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic performance debugging of parallel applications includes two main steps: locating performance bottlenecks and uncovering their root causes for performance optimization. Previous work fails to resolve this challenging issue in two ways: first, several previous efforts automate locating bottlenecks, but present results in a confined way that only identifies performance problems with a priori knowledge; second, several tools take exploratory or confirmatory data analysis to automatically discover relevant performance data relationships, but these efforts do not focus on locating performance bottlenecks or uncovering their root causes. The simple program and multiple data (SPMD) programming model is widely used for both high performance computing and Cloud computing. In this paper, we design and implement an innovative system, AutoAnalyzer, that automates the process of debugging performance problems of SPMD-style parallel programs, including data collection, performance behavior analysis, locating bottlenecks, and uncovering their root causes. AutoAnalyzer is unique in terms of two features: first, without any prior knowledge, it automatically locates bottlenecks and uncovers their root causes for performance optimization; second, it is lightweight in terms of the size of performance data to be collected and analyzed. Our contributions are three-fold: first, we propose two effective clustering algorithms to investigate the existence of performance bottlenecks that cause process behavior dissimilarity or code region behavior disparity, respectively; meanwhile, we present two searching algorithms to locate bottlenecks; second, on the basis of the rough set theory, we propose an innovative approach to automatically uncover root causes of bottlenecks; third, on the cluster systems with two different configurations, we use two production applications, written in Fortran 77, and one open source code-MPIBZIP2 (http://compression.ca/mpibzip2/), written in C++, to verify the effectiveness and correctness of our methods. For three applications, we also propose an experimental approach to investigating the effects of different metrics on locating bottlenecks.