Timestamp synchronization for event traces of large-scale message-passing applications

  • Authors:
  • Daniel Becker;Rolf Rabenseifner;Felix Wolf

  • Affiliations:
  • Forschungszentrum Jülich, John von Neumann Institute for Computing, Jülich, Germany;University of Stuttgart, High-Performance Computing-Center, Stuttgart, Germany;Forschungszentrum Jülich, John von Neumann Institute for Computing, Jülich, Germany

  • Venue:
  • PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Identifying wait states in event traces of message-passing applications requires measuring temporal displacements between concurrent events. In the absence of synchronized hardware clocks, linear interpolation techniques can already account for differences in offset and drift, assuming that the drift of an individual processor is not time dependant. However, inaccuracies and drifts varying in time can still cause violations of the logical event ordering. The controlled logical clock algorithm accounts for such violations in point-to-point communication by shifting message events in time as much as needed while trying to preserve the length of intervals between local events. In this article, we describe how the controlled logical clock is extended to collective communication to enable a more complete correction of realistic message-passing traces. In addition, we present a parallel version of the algorithm that is intended to scale to thousands of application processes and outline its implementation within the framework of the scalasca toolkit.