Lessons from Giant-Scale Services
IEEE Internet Computing
TPC-W: A Benchmark for E-Commerce
IEEE Internet Computing
A Low Latency, Loss Tolerant Architecture and Protocol for Wide Area Group Communication
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Lazy modular upgrades in persistent object stores
OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
Devirtualizable virtual machines enabling general, single-node, online maintenance
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Practical dynamic software updating for C
Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Reducing downtime due to system maintenance and upgrades
LISA '05 Proceedings of the 19th conference on Large Installation System Administration Conference - Volume 19
Understanding and dealing with operator mistakes in internet services
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Why do internet services fail, and what can be done about it?
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Understanding and validating database system administration
ATEC '06 Proceedings of the annual conference on USENIX '06 Annual Technical Conference
Staged deployment in mirage, an integrated software upgrade testing and distribution system
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
No more HotDependencies: toward dependency-agnostic online upgrades in distributed systems
HotDep'07 Proceedings of the 3rd workshop on on Hot Topics in System Dependability
Package upgrades in FOSS distributions: details and challenges
Proceedings of the 1st International Workshop on Hot Topics in Software Upgrades
Safe and timely updates to multi-threaded programs
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Automated detection of refactorings in evolving components
ECOOP'06 Proceedings of the 20th European conference on Object-Oriented Programming
Dependable, online upgrades in enterprise systems
Proceedings of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications
Dependable, online upgrades in enterprise systems
Proceedings of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications
Toward upgrades-as-a-service in distributed systems
Proceedings of the 10th ACM/IFIP/USENIX International Conference on Middleware
Report on the fourth workshop on hot topics in software upgrades (HotSWUp 2012)
ACM SIGOPS Operating Systems Review
Safe and automatic live update for operating systems
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Back to the future: fault-tolerant live update with time-traveling state transfer
LISA'13 Proceedings of the 27th international conference on Large Installation System Administration
Hi-index | 0.00 |
Enterprise-system upgrades are unreliable and often produce downtime or data-loss. Errors in the upgrade procedure, such as broken dependencies, constitute the leading cause of upgrade failures. We propose a novel upgrade-centric fault model, based on data from three independent sources, which focuses on the impact of procedural errors rather than software defects. We show that current approaches for upgrading enterprise systems, such as rolling upgrades, are vulnerable to these faults because the upgrade is not an atomic operation and it risks breaking hidden dependencies among the distributed system-components. We also present a mechanism for tolerating complex procedural errors during an upgrade. Our system, called Imago, improves availability in the fault-free case, by performing an online upgrade, and in the faulty case, by reducing the risk of failure due to breaking hidden dependencies. Imago performs an end-to-end upgrade atomically and dependably, by dedicating separate resources to the new version and by isolating the old version from the upgrade procedure. Through fault injection, we show that Imago is more reliable than online-upgrade approaches that rely on dependency-tracking and that create system states with mixed versions.