Debugging of heterogeneous parallel systems

  • Authors:
  • Alessandro Forin

  • Affiliations:
  • Carnegie-Mellon Univ., Pittsburgh, PA

  • Venue:
  • PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
  • Year:
  • 1988

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Agora system supports the development of heterogeneous parallel programs, e.g. programs written in multiple languages and running on heterogeneous machines. Agora has been used since September 1986 in a large distributed system [1]: Two versions of the application have been demonstrated in one year, contrary to the expectation of two years per one version. The simplicity in debugging is one of the reasons of the productivity speedup gained. This simplicity is due both to the deeper understanding that the debugger has of parallel systems, and to a novel feature: the ability to replay the execution of parallel systems built with Agora. A user is able to exactly repeat for any number of times and at a slower pace an execution that failed. This makes it easy to identify time-dependent errors, which are peculiar to parallel and distributed systems. The debugger can also be customized to support user defined synchronization primitives, which are built on top of the system provided ones. The Agora debugger tackles three set of problems that no parallel debugger in the past has simultaneously addressed: dealing with programming-in-the-large, multiple processes in different languages, and multiple machine architectures.