Understanding the limitations of causally and totally ordered communication
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Reliability Issues in Computing System Design
ACM Computing Surveys (CSUR)
The Recovery Manager of the System R Database Manager
ACM Computing Surveys (CSUR)
An efficient and highly available read-one write-all protocol for replicated data management
PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
The costs and limits of availability for replicated services
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
A Fault Detection Service for Wide Area Distributed Computations
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
FIMD-MPI: A Tool for Injecting Faults into MPI Applications
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
A Practical Approach for Zero' Downtime in an Operational Information System
ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Fault-tolerance in the Borealis distributed stream processing system
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Combining statistical monitoring and predictable recovery for self-management
WOSS '04 Proceedings of the 1st ACM SIGSOFT workshop on Self-managed systems
Autonomic Self-Optimization According to Business Objectives
ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Utility Functions in Autonomic Systems
ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Fault-tolerance for Stateful Application Servers in the Presence of Advanced Transactions Patterns
SRDS '05 Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems
Microreboot — A technique for cheap recovery
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Implementing Diverse Messaging Models with Self-Managing Properties using IFLOW
ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
I-RMI: performance isolation in information flow applications
Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
INFOCOM'96 Proceedings of the Fifteenth annual joint conference of the IEEE computer and communications societies conference on The conference on computer communications - Volume 2
Self-healing execution of business processes based on a peer-to-peer service architecture
ARCS'05 Proceedings of the 18th international conference on Architecture of Computing Systems conference on Systems Aspects in Organic and Pervasive Computing
Dynamically adapting tuple replication for managing availability in a shared data space
COORDINATION'05 Proceedings of the 7th international conference on Coordination Models and Languages
Hi-index | 0.00 |
Enterprises rely critically on the timely and sustained delivery of information. To support this need, we augment information flow middleware with new functionality that provides high levels of availability to distributed applications while at the same time maximizing the utility end users derive from such information. Specifically, the paper presents utility-driven ‘proactive availability-management' techniques to offer (1) information flows that dynamically self-determine their availability requirement based on high-level utility specifications, (2) flows that can trade recovery time for performance based on the ‘perceived' stability of and failure predictions (early alarm) for the underlying system, and (3) methods, based on real-world case studies, to deal with both transient and non-transient failures. Utility-driven ‘proactive availability-management' is integrated into information flow middleware and used with representative applications. Experiments reported in the paper demonstrate middleware capability to self-determine availability guarantees, to offer improved performance versus a statically configured system, and to be resilient to a wide range of faults.