Scalable Parallel Program Debugging with Process Isolation and Grouping

  • Authors:
  • Dieter Kranzlmüller

  • Affiliations:
  • -

  • Venue:
  • IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Using cyclic debugging techniques for large-scale parallel programs is often prohibited by a program's size and runtime and the associated computing costs. The hitherto conventional solution of down-scaling the number of processes and/or the problem size is only partially satisfying, because the program's behavior may differ significantly compared to the original scale and some errors may even vanish completely. A different solution is offered by process isolation, which allows to extract and execute single processes of an arbitrary parallel program. By simulating the surroundings of the selected process, this method pretends the program's behavior on full scale. Combined with a grouping strategy, it allows to reduce the number of actually running processes to a small group instead of a single process. This enables the application of well-established debugging tools and their functionality for error detection with manageable numbers of processes. In addition, the grouping mechanism allows to adjust the size of the traces needed to simulate process interaction.