Evil twins: two models for TCB reduction in HPC clusters

  • Authors:
  • Jacob Gorm Hansen;Eske Christiansen;Eric Jul

  • Affiliations:
  • University of Copenhagen, Denmark;University of Copenhagen, Denmark;University of Copenhagen, Denmark

  • Venue:
  • ACM SIGOPS Operating Systems Review
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional high performance computing systems require extensive management and suffer from security and configuration problems. This paper presents two generations of a cluster-management system that aims at making clusters as secure and self-managing as possible. The goal of the system is minimality: All nodes in a cluster are configured with a minimal software base consisting of a virtual machine monitor and a remote bootstrapping mechanism, and customers then buy access using a simple pre-paid token scheme. All necessary application software, including the operating system, is provided by the customer as a full virtual machine, and boot-strapped or migrated into the cluster. We have explored two different models for cluster control. The first, a decentralized push model ("Evil Man"1), requires direct network access to cluster nodes, each of which is running a truly minimal control plane implementation consisting of only a few hundred lines of C code. In the second, a centralized pull model ("Evil Twin"), nodes may be running behind NATs or firewalls, and are controlled by a centralized web service. A specially developed cache invalidation protocol is used for telling nodes when to reload their workload description from the centralized service.