Automated, scalable debugging of MPI programs with Intel® Message Checker

  • Authors:
  • Jayant DeSouza;Bob Kuhn;Bronis R. de Supinski;Victor Samofalov;Sergey Zheltov;Stanislav Bratanov

  • Affiliations:
  • Intel Corporation, Champaign, IL;Intel Corporation, Champaign, IL;Lawrence Livermore National Lab, Livermore, CA;Intel Russia;Intel Russia;Intel Russia

  • Venue:
  • Proceedings of the second international workshop on Software engineering for high performance computing system applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

The trend towards many-core multi-processor systems and clusters will make systems with tens and hundreds of processors more widely available. Current manual debugging techniques do not scale well to such large systems. Advanced automated debugging tools are needed for standard programming models based on commodity computing, such as threads and MPI. We surveyed MPI users to identify the kinds of MPI errors that they encounter, and classify the errors into several types. We describe how automated tools can detect such errors and present the Intel® Message Checker (IMC) technology being developed at the Intel Advanced Computing Center. IMC's unique technology automatically detects several kinds of MPI errors such as various types of mismatches, race conditions, deadlocks and potential deadlocks, and resource misuse. Finally, we review the usability and uniqueness of IMC and discuss our future plans.