Scalable timestamp synchronization for event traces of message-passing applications

  • Authors:
  • Daniel Becker;Rolf Rabenseifner;Felix Wolf;John C. Linford

  • Affiliations:
  • Jülich Supercomputing Centre, Forschungszentrum Jülich, 52425 Jülich, Germany and Department of Computer Science, RWTH Aachen University, 52056 Aachen, Germany;High Performance Computing Center, University of Stuttgart, 70550 Stuttgart, Germany;Jülich Supercomputing Centre, Forschungszentrum Jülich, 52425 Jülich, Germany and Department of Computer Science, RWTH Aachen University, 52056 Aachen, Germany;Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA

  • Venue:
  • Parallel Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Event traces are helpful in understanding the performance behavior of message-passing applications since they allow the in-depth analysis of communication and synchronization patterns. However, the absence of synchronized clocks may render the analysis ineffective because inaccurate relative event timings may misrepresent the logical event order and lead to errors when quantifying the impact of certain behaviors. Although linear offset interpolation can restore consistency to some degree, time-dependent drifts and other inaccuracies may still disarrange the original succession of events - especially during longer runs. The controlled logical clock algorithm accounts for such violations in point-to-point communication by shifting message events in time as much as needed while trying to preserve the length of local intervals. In this article, we describe how the controlled logical clock is extended to collective communication to enable the correction of realistic message-passing traces. We present a parallel version of the algorithm scaling to more than thousand processes and evaluate its accuracy by showing that it eliminates inconsistent inter-process timings while preserving the length of local intervals.