Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
The process group approach to reliable distributed computing
Communications of the ACM
e-Transactions: End-to-End Reliability for Three-Tier Architectures
IEEE Transactions on Software Engineering
Surviving Network Partitioning
Computer
AQuA: An Adaptive Architecture that Provides Dependable Distributed Objects
IEEE Transactions on Computers
Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems
Middleware '01 Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg
The design of a CORBA group communication service
SRDS '96 Proceedings of the 15th Symposium on Reliable Distributed Systems
Understanding Replication in Databases and Distributed Systems
ICDCS '00 Proceedings of the The 20th International Conference on Distributed Computing Systems ( ICDCS 2000)
Experiences, Strategies, and Challenges in Building Fault-Tolerant CORBA Systems
IEEE Transactions on Computers
Highly available, fault-tolerant, parallel dataflows
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Towards Real-Time Fault-Tolerant CORBA Middleware
Cluster Computing
High-Availability Algorithms for Distributed Stream Processing
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Farsite: federated, available, and reliable storage for an incompletely trusted environment
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Fault-tolerance in the Borealis distributed stream processing system
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Meridian: a lightweight network location service without virtual coordinates
Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
BAR fault tolerance for cooperative services
Proceedings of the twentieth ACM symposium on Operating systems principles
Fault-tolerance for Stateful Application Servers in the Presence of Advanced Transactions Patterns
SRDS '05 Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems
Thema: Byzantine-Fault-Tolerant Middleware forWeb-Service Applications
SRDS '05 Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems
MIDDLE-R: Consistent database replication at the middleware level
ACM Transactions on Computer Systems (TOCS)
Network-Aware Operator Placement for Stream-Processing Systems
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Operating system support for planetary-scale network services
NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Availability of multi-object operations
NSDI'06 Proceedings of the 3rd conference on Networked Systems Design & Implementation - Volume 3
Latency and bandwidth-minimizing failure detectors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Optimal inter-object correlation when replicating for availability
Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing
Network-aware query processing for stream-based applications
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
DBFarm: a scalable cluster for multiple databases
Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware
Synergy: sharing-aware component composition for distributed stream processing systems
Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware
Utility-driven proactive management of availability in enterprise-scale information flows
Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware
An adaptive quality of service aware middleware for replicated services
IEEE Transactions on Parallel and Distributed Systems
Placement of replicated tasks for distributed stream processing systems
Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
Hi-index | 0.00 |
A significant number of emerging on-line data analysis applications require the processing of data streams, large amounts of data that get updated continuously, to generate outputs of interest or to identify meaningful events. Example domains include network traffic management, stock price monitoring, customized e-commerce websites, and analysis of sensor data. In this paper we look at the problem of high availability in such a distributed stream processing system. By taking into account the particular characteristics of stream processing applications we first identify design principles for a replica placement algorithm for high availability. We incorporate these principles in a decentralized replica placement protocol that aims to maximize availability, while respecting resource constraints, and making performance-aware placement decisions. We have integrated our replica placement protocol in Synergy, our distributed stream processing middleware. Our experimental comparison over PlanetLab with the current state of the art corroborates our claims that our techniques maximize availability while sustaining good performance.