Replica placement for high availability in distributed stream processing systems

  • Authors:
  • Thomas Repantis;Vana Kalogeraki

  • Affiliations:
  • University of California, Riverside, CA;University of California, Riverside, CA

  • Venue:
  • Proceedings of the second international conference on Distributed event-based systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

A significant number of emerging on-line data analysis applications require the processing of data streams, large amounts of data that get updated continuously, to generate outputs of interest or to identify meaningful events. Example domains include network traffic management, stock price monitoring, customized e-commerce websites, and analysis of sensor data. In this paper we look at the problem of high availability in such a distributed stream processing system. By taking into account the particular characteristics of stream processing applications we first identify design principles for a replica placement algorithm for high availability. We incorporate these principles in a decentralized replica placement protocol that aims to maximize availability, while respecting resource constraints, and making performance-aware placement decisions. We have integrated our replica placement protocol in Synergy, our distributed stream processing middleware. Our experimental comparison over PlanetLab with the current state of the art corroborates our claims that our techniques maximize availability while sustaining good performance.