HPC environment management: new challenges in the petaflop era

  • Authors:
  • Jonas Dias;Albino Aveleda

  • Affiliations:
  • Federal University of Rio de Janeiro, Centro de Tecnologia, Rio de Janeiro, Brazil;Federal University of Rio de Janeiro, Centro de Tecnologia, Rio de Janeiro, Brazil

  • Venue:
  • VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

High Performance Computing (HPC) is becoming much more popular nowadays. Currently, the biggest supercomputers in the world have hundreds of thousands of processors and consequently may have more software and hardware failures. HPC centers managers also have to deal with multiple clusters from different vendors with their particular architectures. However, since there are not enough HPC experts to manage all the new supercomputers, it is expected that non-experts will be managing those large clusters. In this paper we study the new challenges to manage HPC environments containing different clusters with different sizes and architectures. We review available tools and present LEMMing [1], an easy-to-use open source tool developed to support high performance computing centers. LEMMing integrates machine resources and the available management and monitoring tools on a single point of management.