Flux: a mechanism for building robust, scalable dataflows

Authors:
Mehul Arunkumar Shah;Joseph M. Hellerstein
Affiliations:
University of California, Berkeley;University of California, Berkeley
Venue:
Flux: a mechanism for building robust, scalable dataflows
Year:
2004

Citing 0
Cited 1

Adaptive input admission and management for parallel stream processing

Proceedings of the 7th ACM international conference on Distributed event-based systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present techniques for robustly scaling high-throughput, 24 x 7, data-stream processing applications. Examples of such applications include intrusion or denial-of-service detection, click-stream processing, and online analysis of financial quote streams. In the TelegraphCQ project, we implement these applications using a general-purpose continuous query (CQ) engine that executes long-running dataflows. To scale the performance of these dataflows, we parallelize them across a cluster of workstations. For these critical applications, high availability, fault-tolerance, and scalability are important goals. These goals are challenging to achieve on a cluster because machines are bound to fail, and load imbalances are likely to arise. In this thesis, we develop the design of Flux, a reusable communication abstraction that enables long-running, parallel dataflows to adapt on-the-fly to these problems. Flux encapsulates mechanisms that allow a dataflow to mask failures and to automatically recover from them as they occur during execution. Flux leverages these same mechanisms to periodically rebalance a dataflow and keep it running efficiently. By encapsulating the critical, fault-tolerance and load-balancing logic into Flux, we enable its reuse in a variety of dataflow applications with little modification to existing dataflow components and interfaces. Thus, by simply constructing a parallel dataflow using Flux, an application developer can make the dataflow robust.