Service-Oriented Operating System: A Key Element in Improving Service Availability
ISAS '07 Proceedings of the 4th international symposium on Service Availability
Linux bugs: Life cycle, resolution and architectural analysis
Information and Software Technology
A microdriver architecture for error correcting codes inside the Linux kernel
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
NOVA: a microhypervisor-based secure virtualization architecture
Proceedings of the 5th European conference on Computer systems
Membrane: Operating system support for restartable file systems
ACM Transactions on Storage (TOS)
Membrane: operating system support for restartable file systems
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
OS-level hang detection in complex software systems
International Journal of Critical Computer-Based Systems
Operating system support to detect application hangs
VECoS'08 Proceedings of the Second international conference on Verification and Evaluation of Computer and Communication Systems
Toward predictable, efficient, system-level tolerance of transient faults
ACM SIGBED Review - Special Issue on the 5th Workshop on Adaptive and Reconfigurable Embedded Systems
Hi-index | 0.00 |
It has been well established that most operating system crashes are due to bugs in device drivers. Because drivers are normally linked into the kernel address space, a buggy driver can wipe out kernel tables and bring the system crashing to a grinding halt. We have greatly mitigated this problem by reducing the kernel to an absolute minimum and running each driver as a separate, unprivileged user-mode process. In addition, we implemented a POSIX-conformant operating system, MINIX 3, as multiple user-mode servers. In this design, a server or driver failure no longer is fatal and does not require rebooting the computer. This paper discusses how we designed and implemented the system, which problems we encountered, and how we solved these problems. We also discuss the performance effects of our changes and evaluate how our multiserver design improves operating system dependability over monolithic designs.