Extending a traditional debugger to debug massively parallel applications

  • Authors:
  • Susanne M. Balle;Bevin R. Brett;Chih-Ping Chen;David LaFrance-Linden

  • Affiliations:
  • Hewlett-Packard, MS ZKO2-3/Q08, 110 Spit Brook Road, Nashua, NH;Hewlett-Packard, MS ZKO2-3/Q08, 110 Spit Brook Road, Nashua, NH and Intel Computer Corporation, 110 Spit Brook Road, Nashua, NH;Hewlett-Packard, MS ZKO2-3/Q08, 110 Spit Brook Road, Nashua, NH and Intel Computer Corporation, 110 Spit Brook Road, Nashua, NH;Hewlett-Packard, MS ZKO2-3/Q08, 110 Spit Brook Road, Nashua, NH

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Beowulf systems, and other proprietary approaches, are placing systems with four or more CPUs in the hands of many researchers and commercial users. In the near future, systems with hundreds of CPUs will become commonly available, with some programmers dealing with tens of thousands of CPUs. The debugging methods used on these systems are a combination of the traditional methods used for debugging single processes and ad-hoc methods to help the user cope with the multitudes of processes. Programmers are usually familiar with a single-process debugger and would like to use it (with minimal user-visible extensions) to debug their distributed program.We present a set of modifications to a traditional debugger that makes it capable of debugging applications running on thousands of processes. Our parallel debugger is composed of individual fully functional debuggers connected with an n-nary aggregating network. This permits us to present to users the results from each debugger at the same time in an aggregated fashion. Users get a global view of the application and can easily see if a given parameter has a different value from either what they expect it to be or from the other processes. Users can then focus on the process sets of interest and investigate the problem.One challenge when debugging thousands of processes is to deal with the amount of output coming from all the debuggers. We present methods to aggregate the overwhelming amount of output from the debuggers into a more manageable subset, which is presented to the user without losing information.Experiments show that the debugger is scalable to thousands of processors. The startup mechanism, as well as users' command response time scale well. The conclusions preseated regarding the architecture and the new parallel debugger's scalability are not specific to the serial debugger we are using in our example implementation.