Chord: a scalable peer-to-peer lookup protocol for internet applications
IEEE/ACM Transactions on Networking (TON)
TelegraphCQ: continuous dataflow processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Load management and high availability in the Medusa distributed stream processing system
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Design, implementation, and evaluation of the linear road bnchmark on the stream processing core
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Monitoring streams: a new class of data management applications
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Fault-tolerance in the borealis distributed stream processing system
ACM Transactions on Database Systems (TODS)
SPADE: the system s declarative stream processing engine
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Elastic scaling of data parallel operators in stream processing
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
RIP: run-based intra-query parallelism for scalable complex event processing
Proceedings of the 7th ACM international conference on Distributed event-based systems
Hi-index | 0.00 |
We consider a distributed stream processing application, expressed as a data-flow graph with operators as vertices connected by streams and deployed over a cluster of compute nodes, where a small subset of the operators are often the performance bottlenecks for the entire application. In cases where a bottleneck operator is stateless, it is obvious that parallelization by splitting the incoming stream among multiple parallel operators deployed on different nodes can help improve performance. However, it is not so obvious when the bottleneck operator is stateful. In such a case, parallelization is much more challenging as it often requires a state sharing mechanism for the parallel operators. Moreover, it incurs additional overheads of required accesses by the parallel operators to shared state and synchronization constructs. In this paper, we propose a parallelization framework for stateful stream processing operators. The framework not only addresses issues related to the system model and support for operator parallelization, but also delves into the theoretical details that model the suitability of parallelization and the optimal degree of parallelism. We have implemented and evaluated our framework in the context of IBM's System S distributed stream processing middleware. While microbenchmarks are used to validate the proposed theoretical model, a parallelized implementation of a moving KNN application is used for the purpose of evaluation.