Impact of noise on scaling of collectives: an empirical evaluation

  • Authors:
  • Rahul Garg;Pradipta De

  • Affiliations:
  • IBM India Research Laboratory, Hauz Khas, New Delhi;IBM India Research Laboratory, Hauz Khas, New Delhi

  • Venue:
  • HiPC'06 Proceedings of the 13th international conference on High Performance Computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is increasingly becoming evident that operating system interference in the form of daemon activity and interrupts contribute significantly to performance degradation of parallel applications in large clusters. An earlier theoretical study has evaluated the impact of system noise on application performance for different noise distributions [1]. Our work complements the theoretical analysis by presenting an empirical study of noise in production clusters. We designed a parallel benchmark that was used on large clusters at SanDeigo Supercomputing Center for collecting noise related data. This data was fed to a simulator that predicts the performance of collective operations using the model of [1]. We report our comparison of the predicted and the observed performance. Additionally, the tools developed in the process have been instrumental in identifying anomalous nodes that could potentially be affecting application performance if undetected.