STORM: lightning-fast resource management

  • Authors:
  • Eitan Frachtenberg;Fabrizio Petrini;Juan Fernandez;Scott Pakin;Salvador Coll

  • Affiliations:
  • Los Alamos National Laboratory;Los Alamos National Laboratory;Los Alamos National Laboratory;Los Alamos National Laboratory;Los Alamos National Laboratory

  • Venue:
  • Proceedings of the 2002 ACM/IEEE conference on Supercomputing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Although workstation clusters are a common platform for high-performance computing (HPC), they remain more difficult to manage than sequential systems or even symmetric multiprocessors. Furthermore, as cluster sizes increase, the quality of the resource-management subsystem---essentially, all of the code that runs on a cluster other than the applications---increasingly impacts application efficiency. In this paper, we present STORM, a resource-management framework designed for scalability and performance. The key innovation behind STORM is a software architecture that enables resource management to exploit low-level network features. As a result of this HPC-application-like design, STORM is orders of magnitude faster than the best reported results in the literature on two sample resource-management functions: job launching and process scheduling.