Managing parallelism for stream processing in the cloud

  • Authors:
  • Nathan Backman;Rodrigo Fonseca;Uǧur Çetintemel

  • Affiliations:
  • Brown University;Brown University;Brown University

  • Venue:
  • Proceedings of the 1st International Workshop on Hot Topics in Cloud Data Processing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Stream processing applications run continuously and have varying load. Cloud infrastructures present an attractive option to meet these fluctuating computational demands. Coordinating such resources to meet end-to-end latency objectives efficiently is important in preventing the frivolous use of cloud resources. We present a framework that parallelizes and schedules workflows of stream operators, in real-time, to meet latency objectives. It supports data- and task-parallel processing of all workflow operators, by all computing nodes, while maintaining the ordering properties of sorted data streams. We show that a latency-oriented operator scheduling policy coupled with the diversification of computing node responsibilities encourages parallelism models that achieve end-to-end latency-minimization goals. We demonstrate the effectiveness of our framework with preliminary experimental results using a variety of real-world applications on heterogeneous clusters.