Flux: An Adaptive Partitioning Operator for Continuous Query Systems

Authors:
Mehul A. Shah;Joseph M. Hellerstein;Sirish Chandrasekaran;Michael J. Franklin
Affiliations:
-;-;-;-
Venue:
Flux: An Adaptive Partitioning Operator for Continuous Query Systems
Year:
2002

Citing 0
Cited 6

High-Availability Algorithms for Distributed Stream Processing

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A dynamically adaptive distributed system for processing complex continuous queries

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Query suspend and resume

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce optimization using regulated dynamic prioritization

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Clustersheddy: load shedding using moving clusters over spatio-temporal data streams

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Managing parallelism for stream processing in the cloud

Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

The long-running nature of continuous queries coupled with their high scalability requirements poses new challanges for dataflow processing. CQ systems execute pipelined dataflows that are shared across multiple queries and whose scalability is limited by their constituent, stateful operators -- e.g. a windowed groupby-aggregate. To scale such operators, a natural solution is to partition them across a shared-nothing platform. But in the CQ context, traditional, static techniques for partitioned parallelism can exhibit detrimental imbalances as workload and runtime conditions evolve. Long-running CQ dataflows must continue to function robustly in the face of these imbalances. To address this challenge, we introduce a dataflow operator called Flux that encapsulates adaptive state partitioning and dataflow routing. Flux is placed between producer-consumer stages in a dataflow pipeline to repartition stateful operators while the pipeline is still executing. We present the Flux architecture, along with repartitioning policies that can be used for CQ operators under shifting processing and memory loads. We show that the Flux mechanism and these policies can provide several factors improvement in throughput, and orders of magnitude improvement in average latency over the static case.