IEEE Spectrum
Rapid application development
Implementation of Online Distributed System-Level Diagnosis Theory
IEEE Transactions on Computers - Special issue on fault-tolerant computing
Horus: a flexible group communication system
Communications of the ACM
A Flexible Software Architecture for High Availability Computing
HASE '98 The 3rd IEEE International Symposium on High-Assurance Systems Engineering
Software Architecture in Practice
Software Architecture in Practice
Proceedings of the 25th International Conference on Software Engineering
AQuA: An Adaptive Architecture that Provides Dependable Distributed Objects
SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
ISIS: A System for Fault-Tolerant Distributed Computing
ISIS: A System for Fault-Tolerant Distributed Computing
Manual and compiler assisted methods for generating fault-tolerant parallel programs
Manual and compiler assisted methods for generating fault-tolerant parallel programs
A Model Driven Approach for Software Systems Reliability
Proceedings of the 26th International Conference on Software Engineering
DMS®: Program Transformations for Practical Scalable Software Evolution
Proceedings of the 26th International Conference on Software Engineering
Dependable Initialization of Large-Scale Distributed Software
DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
RTCSA '06 Proceedings of the 12th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications
Hi-index | 0.00 |
In today's rapidly evolving marketplace, the ability to quickly build and deploy new systems is an increasingly critical factor in a company's success. For certain domains, such as telecommunications, it is taken for granted that systems will be highly available, with expectations of "5 9s" or even higher availability, translating to five minutes or less downtime per year. However, building highly available systems is generally very challenging, and becoming even more challenging as the systems increase in complexity. High availability (HA) middleware solutions partially address this challenge by providing common HA services that system developers can use. However, developers still need to spend significant effort integrating their systems with the HA middleware. In this paper, we present the Aurora Management Workbench (AMW) as a solution to the integration problem. AMW is an HA middleware and tools for building highly available distributed software systems. It is unique in its approach for developing highly available systems: developers focus only on describing key architectural abstractions of their system as well as system high availability needs in the form of a model. Tools then use the model to generate much of the code needed to integrate the system with the AMW HA middleware, which also uses the model to coordinate and control HA services at run-time. This paper describes our approach and our initial successes using it to develop commercial telecom systems.