Efficiency vs. portability in cluster-based network servers
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Information Processing and Human-Machine Interaction: An Approach to Cognitive Engineering
Information Processing and Human-Machine Interaction: An Approach to Cognitive Engineering
Understanding BGP misconfiguration
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
The Vision of Autonomic Computing
Computer
A recovery-oriented approach to dependable services: repairing past errors with system-wide undo
A recovery-oriented approach to dependable services: repairing past errors with system-wide undo
Quantifying the Performability of Cluster-Based Services
IEEE Transactions on Parallel and Distributed Systems
Usable Autonomic Computing Systems: The Administrator's Perspective
ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Understanding and dealing with operator mistakes in internet services
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
CP-Miner: a tool for finding copy-paste and related bugs in operating system code
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Why do internet services fail, and what can be done about it?
USITS'03 Proceedings of the 4th conference on USENIX Symposium on Internet Technologies and Systems - Volume 4
Barricade: defending systems against operator mistakes
Proceedings of the 5th European conference on Computer systems
Hi-index | 0.00 |
In this paper, we argue that human-factors studies are critical in building a wide range of dependable systems. In particular, only with a deep understanding of the causes, types, and likelihoods of human mistakes can we build systems that prevent, hide, or at least tolerate human mistakes by design. We propose several research directions for studying how humans impact availability in the context of Internet services. In addition, we describe validation as one strategy for hiding human mistakes in these systems. Finally, we propose the use of operator, performance, and availability models to guide human actions. We conclude with a call for the systems community to make the human an integral, first-class concern in computer system design.