Safe and automatic live update for operating systems

  • Authors:
  • Cristiano Giuffrida;Anton Kuijsten;Andrew S. Tanenbaum

  • Affiliations:
  • Vrije Universiteit, Amsterdam, Netherlands;Vrije Universiteit, Amsterdam, Netherlands;Vrije Universiteit, Amsterdam, Netherlands

  • Venue:
  • Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Increasingly many systems have to run all the time with no downtime allowed. Consider, for example, systems controlling electric power plants and e-banking servers. Nevertheless, security patches and a constant stream of new operating system versions need to be deployed without stopping running programs. These factors naturally lead to a pressing demand for live update---upgrading all or parts of the operating system without rebooting. Unfortunately, existing solutions require significant manual intervention and thus work reliably only for small operating system patches. In this paper, we describe an automated system for live update that can safely and automatically handle major upgrades without rebooting. We have implemented our ideas in Proteos, a new research OS designed with live update in mind. Proteos relies on system support and nonintrusive instrumentation to handle even very complex updates with minimal manual effort. The key novelty is the idea of state quiescence, which allows updates to happen only in safe and predictable system states. A second novelty is the ability to automatically perform transactional live updates at the process level, ensuring a safe and stable update process. Unlike prior solutions, Proteos supports automated state transfer, state checking, and hot rollback. We have evaluated Proteos on 50 real updates and on novel live update scenarios. The results show that our techniques can effectively support both simple and complex updates, while outperforming prior solutions in terms of flexibility, security, reliability, and stability of the update process.