Utility-driven proactive management of availability in enterprise-scale information flows

Authors:
Zhongtang Cai;Vibhore Kumar;Brian F. Cooper;Greg Eisenhauer;Karsten Schwan;Robert E. Strom
Affiliations:
College of Computing, Georgia Institute of Technology, Atlanta, GA;College of Computing, Georgia Institute of Technology, Atlanta, GA;College of Computing, Georgia Institute of Technology, Atlanta, GA;College of Computing, Georgia Institute of Technology, Atlanta, GA;College of Computing, Georgia Institute of Technology, Atlanta, GA;IBM Watson Research Center, Hawthorne, NY
Venue:
Middleware'06 Proceedings of the 7th ACM/IFIP/USENIX international conference on Middleware
Year:
2006

Citing 20
Cited 0

Understanding the limitations of causally and totally ordered communication

SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Reliability Issues in Computing System Design

ACM Computing Surveys (CSUR)
The Recovery Manager of the System R Database Manager

ACM Computing Surveys (CSUR)
An efficient and highly available read-one write-all protocol for replicated data management

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
The costs and limits of availability for replicated services

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
A Fault Detection Service for Wide Area Distributed Computations

HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
FIMD-MPI: A Tool for Injecting Faults into MPI Applications

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
A Practical Approach for Zero' Downtime in an Operational Information System

ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Fault-tolerance in the Borealis distributed stream processing system

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Combining statistical monitoring and predictable recovery for self-management

WOSS '04 Proceedings of the 1st ACM SIGSOFT workshop on Self-managed systems
Autonomic Self-Optimization According to Business Objectives

ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Utility Functions in Autonomic Systems

ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Fault-tolerance for Stateful Application Servers in the Presence of Advanced Transactions Patterns

SRDS '05 Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems
Microreboot — A technique for cheap recovery

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Correlating instrumentation data to system states: a building block for automated diagnosis and control

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Implementing Diverse Messaging Models with Self-Managing Properties using IFLOW

ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
I-RMI: performance isolation in information flow applications

Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
How to model an internetwork

INFOCOM'96 Proceedings of the Fifteenth annual joint conference of the IEEE computer and communications societies conference on The conference on computer communications - Volume 2
Self-healing execution of business processes based on a peer-to-peer service architecture

ARCS'05 Proceedings of the 18th international conference on Architecture of Computing Systems conference on Systems Aspects in Organic and Pervasive Computing
Dynamically adapting tuple replication for managing availability in a shared data space

COORDINATION'05 Proceedings of the 7th international conference on Coordination Models and Languages

Quantified Score

Hi-index	0.00

Visualization

Abstract

Enterprises rely critically on the timely and sustained delivery of information. To support this need, we augment information flow middleware with new functionality that provides high levels of availability to distributed applications while at the same time maximizing the utility end users derive from such information. Specifically, the paper presents utility-driven ‘proactive availability-management' techniques to offer (1) information flows that dynamically self-determine their availability requirement based on high-level utility specifications, (2) flows that can trade recovery time for performance based on the ‘perceived' stability of and failure predictions (early alarm) for the underlying system, and (3) methods, based on real-world case studies, to deal with both transient and non-transient failures. Utility-driven ‘proactive availability-management' is integrated into information flow middleware and used with representative applications. Experiments reported in the paper demonstrate middleware capability to self-determine availability guarantees, to offer improved performance versus a statically configured system, and to be resilient to a wide range of faults.