Fast and Highly-Available Stream Processing over Wide Area Networks

Authors:
Jeong-Hyon Hwang;Ugur Cetintemel;Stan Zdonik
Affiliations:
Department of Computer Science, Brown University. jhhwang@cs.brown.edu;Department of Computer Science, Brown University. ugur@cs.brown.edu;Department of Computer Science, Brown University. sbz@cs.brown.edu
Venue:
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Year:
2008

Citing 0
Cited 11

Borealis-R: a replication-transparent stream processing system for wide-area monitoring applications

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Challenges in dependable internet-scale stream processing

Proceedings of the 2nd workshop on Dependable distributed data management
Out-of-order processing: a new architecture for high-performance stream systems

Proceedings of the VLDB Endowment
Sense the physical, walkthrough the virtual, manage the co (existing) spaces: a database perspective

ACM SIGMOD Record
Detouring and replication for fast and reliable internet-scale stream processing

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Exploitation of backup nodes for reducing recovery cost in high availability stream processing systems

Proceedings of the Fourteenth International Database Engineering & Applications Symposium
iFlow: an approach for fast and reliable Internet-scale stream processing utilizing detouring and replication

Proceedings of the VLDB Endowment
Decentralized management of bi-modal network resources in a distributed stream processing platform

Journal of Parallel and Distributed Computing
Fault injection-based assessment of partial fault tolerance in stream processing applications

Proceedings of the 5th ACM international conference on Distributed event-based system
Pollux: towards scalable distributed real-time search on microblogs

Proceedings of the 16th International Conference on Extending Database Technology
Rollback-recovery without checkpoints in distributed event processing systems

Proceedings of the 7th ACM international conference on Distributed event-based systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a replication-based approach that realizes both fast and highly-available stream processing over wide area networks. In our approach, multiple operator replicas send outputs to each downstream replica so that it can use whichever data arrives first. To further expedite the data flow, replicas run independently, possibly processing data in different orders. Despite this complication, our approach always delivers what non-replicated processing would produce without failures. We call this guarantee replication transparency. In this paper, we first discuss semantic issues for replication transparency and extend stream-processing primitives accordingly. Next, we develop an algorithm that manages replicas at geographically dispersed servers. This algorithm strives to achieve the best latency guarantee, relative to the cost of replication. Finally, we substantiate the utility of our work through experiments on PlanetLab servers as well as simulations based on real network traces.