Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
Recovery Oriented Computing (ROC): Motivation, Definition, Techniques,
A Retrospective on Twelve Years of LISA Proceedings
LISA '99 Proceedings of the 13th USENIX conference on System administration
Studying and using failure data from large-scale internet services
EW 10 Proceedings of the 10th workshop on ACM SIGOPS European workshop
System administrators are users, too: designing workspaces for managing internet-scale systems
CHI '03 Extended Abstracts on Human Factors in Computing Systems
Devirtualizable virtual machines enabling general, single-node, online maintenance
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
On the effectiveness of address-space randomization
Proceedings of the 11th ACM conference on Computer and communications security
Getting more from your virtual machine
Journal of Computing Sciences in Colleges
Modeling and Tracking of Transaction Flow Dynamics for Fault Detection in Complex Systems
IEEE Transactions on Dependable and Secure Computing
Cube management system: a tangible interface for monitoring large scale systems
Proceedings of the 2007 symposium on Computer human interaction for the management of information technology
Active internet traffic filtering: real-time response to denial-of-service attacks
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
PoDIM: a language for high-level configuration management
LISA'07 Proceedings of the 21st conference on Large Installation System Administration Conference
LISA'08 Proceedings of the 22nd conference on Large installation system administration conference
Ranking the importance of alerts for problem determination in large computer systems
ICAC '09 Proceedings of the 6th international conference on Autonomic computing
Autonomic Provisioning for Mobile Commerce
Proceedings of the 2009 conference on Techniques and Applications for Mobile Commerce: Proceedings of TAMoCo 2009
Quantifying the sustainability impact of data center availability
ACM SIGMETRICS Performance Evaluation Review
Proposal on network-wide rollback scheme for fast recovery from operator errors
DSOM'07 Proceedings of the Distributed systems: operations and management 18th IFIP/IEEE international conference on Managing virtualization of networks and services
A service delivery platform for server management services
IBM Journal of Research and Development
To upgrade or not to upgrade: impact of online upgrades across multiple administrative domains
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
HotDep'10 Proceedings of the Sixth international conference on Hot topics in system dependability
A survey of system configuration tools
LISA'10 Proceedings of the 24th international conference on Large installation system administration
FastScale: accelerate RAID scaling by minimizing data migration
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
Quantifying the complexity of IT service management processes
DSOM'06 Proceedings of the 17th IFIP/IEEE international conference on Distributed Systems: operations and management
Integrated management of network and security devices in IT infrastructures
Proceedings of the 7th International Conference on Network and Services Management
Ensuring reliability in B2B services: Fault tolerant inter-organizational workflows
Information Systems Frontiers
Estimating the value of lost telecoms connectivity
Electronic Commerce Research and Applications
Building Highly Dependable Wireless Web Services
Journal of Electronic Commerce in Organizations
Design and Evaluation of a New Approach to RAID-0 Scaling
ACM Transactions on Storage (TOS)
CRAID: online RAID upgrades using dynamic hot data reorganization
FAST'14 Proceedings of the 12th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
Systems that are more dependable and less expensive to maintain may be more expensive to purchase. If ordinary customers cannot calculate the costs of downtime, such systems may not succeed because it will be difficult to justify a higher price. Hence, we propose an easy-to-calculate estimate of downtime.As one reviewer commented, the cost estimate we propose "is simply a symbolic translation of the most obvious, common sense approach to the problem." We take this remark as a complement, noting that prior work has ignored pieces of this obvious formula.We introduce this formula, argue why it will be important to have a formula that can easily be calculated, suggest why it will be hard to get a more accurate estimate, and give some examples.Widespread use of this obvious formula can lay a foundation for systems that reduce downtime.