Flux: a mechanism for building robust, scalable dataflows

  • Authors:
  • Mehul Arunkumar Shah;Joseph M. Hellerstein

  • Affiliations:
  • University of California, Berkeley;University of California, Berkeley

  • Venue:
  • Flux: a mechanism for building robust, scalable dataflows
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present techniques for robustly scaling high-throughput, 24 x 7, data-stream processing applications. Examples of such applications include intrusion or denial-of-service detection, click-stream processing, and online analysis of financial quote streams. In the TelegraphCQ project, we implement these applications using a general-purpose continuous query (CQ) engine that executes long-running dataflows. To scale the performance of these dataflows, we parallelize them across a cluster of workstations. For these critical applications, high availability, fault-tolerance, and scalability are important goals. These goals are challenging to achieve on a cluster because machines are bound to fail, and load imbalances are likely to arise. In this thesis, we develop the design of Flux, a reusable communication abstraction that enables long-running, parallel dataflows to adapt on-the-fly to these problems. Flux encapsulates mechanisms that allow a dataflow to mask failures and to automatically recover from them as they occur during execution. Flux leverages these same mechanisms to periodically rebalance a dataflow and keep it running efficiently. By encapsulating the critical, fault-tolerance and load-balancing logic into Flux, we enable its reuse in a variety of dataflow applications with little modification to existing dataflow components and interfaces. Thus, by simply constructing a parallel dataflow using Flux, an application developer can make the dataflow robust.