Quality assurance for clusters: acceptance-, stress-, and burn-in tests for general purpose clusters

  • Authors:
  • Matthias S. Müller;Guido Juckeland;Matthias Jurenz;Michael Kluge

  • Affiliations:
  • Technische Universität Dresden, Center for Information Services and High Performance Computing, Dresden, Germany;Technische Universität Dresden, Center for Information Services and High Performance Computing, Dresden, Germany;Technische Universität Dresden, Center for Information Services and High Performance Computing, Dresden, Germany;Technische Universität Dresden, Center for Information Services and High Performance Computing, Dresden, Germany

  • Venue:
  • HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although common sense says that all nodes of a cluster should behave identically since they consist of exactly the same hardware parts and are running the same software, experience tells otherwise. We present a collection of programs and tools that were gathered over several years during various cluster installations at different sites with clusters from various vendors. The collection contains programs to check for the setup, functionality, and performance of clusters. Components like CPU, memory, disk, network, MPI and file system are checked. Together with the short description of the tools we describe our experiences using them.