Exploring the Relationship Between Parallel Application Run-Time Variability and Network Performance in Clusters

  • Authors:
  • Jeffrey J. Evans;Cynthia S. Hood;William D. Gropp

  • Affiliations:
  • -;-;-

  • Venue:
  • LCN '03 Proceedings of the 28th Annual IEEE International Conference on Local Computer Networks
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Highly variable parallel application execution time is apersistent issue in cluster computing environments, and canbe particularly acute in systems composed of Networks ofWorkstations (NOWs). We are looking at this issue in termsof consistency. In particular, we are focusing on networkperformance. Before we can use techniques from fault managementto attain consistency, this paper presents our preliminaryanalysis of run-time variability from logs and experiments,exposing important issues related to systemic inconsistencyin NOW clusters. The characterization of applicationsensitivity can be used to set network performancegoals, thereby defining operational requirements. Networkperformance depends on the virtual topology imposed bythe scheduler's allocation of nodes and the communicationpatterns of the set of running applications. Therefore it isimportant to look at both the network and the cluster's centralizednode mapper (scheduler) as critical subsystems.