Reliable computer systems (3rd ed.): design and evaluation
Reliable computer systems (3rd ed.): design and evaluation
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Performance and scalability of EJB applications
OOPSLA '02 Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Undo for operators: building an undoable e-mail store
ATEC '03 Proceedings of the annual conference on USENIX Annual Technical Conference
Microreboot — A technique for cheap recovery
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
Exploring event correlation for failure prediction in coalitions of clusters
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
ACM Transactions on Software Engineering and Methodology (TOSEM)
Evaluating the Quality of Open Source Software
Electronic Notes in Theoretical Computer Science (ENTCS)
Architecture-based fault tolerance support for grid applications
Proceedings of the joint ACM SIGSOFT conference -- QoSA and ACM SIGSOFT symposium -- ISARCS on Quality of software architectures -- QoSA and architecting critical systems -- ISARCS
Architecting web services applications for improving availability
Architecting Dependable Systems III
Towards dependable clients: improving the reliability and availability of the browsers
Proceedings of the 9th Middleware Doctoral Symposium of the 13th ACM/IFIP/USENIX International Middleware Conference
Hi-index | 4.10 |
The Recovery-Oriented Computing project studied techniques to help systems quickly recover from inevitable failures. ROC research focuses mainly on Internet services because they can growto immense proportions, are subject to perpetual evolution, have varying workloads, and are expected to run 24/7.The project has implemented two building blocks for recovery: microreboot and system-level undo. These researchers believe that most of what we have learned from Internet services can also be appliedto desktops, smaller network services, and other computing environments.