Prediction-based geometric monitoring over distributed data streams

  • Authors:
  • Nikos Giatrakos;Antonios Deligiannakis;Minos Garofalakis;Izchak Sharfman;Assaf Schuster

  • Affiliations:
  • University of Piraeus, Piraeus, Greece;Technical University of Crete, Chania, Greece;Technical University of Crete, Chania, Greece;Technion, Haifa, Israel;Technion, Haifa, Israel

  • Venue:
  • SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many modern streaming applications, such as online analysis of financial, network, sensor and other forms of data are inherently distributed in nature. An important query type that is the focal point in such application scenarios regards actuation queries, where proper action is dictated based on a trigger condition placed upon the current value that a monitored function receives. Recent work studies the problem of (non-linear) sophisticated function tracking in a distributed manner. The main concept behind the geometric monitoring approach proposed there, is for each distributed site to perform the function monitoring over an appropriate subset of the input domain. In the current work, we examine whether the distributed monitoring mechanism can become more efficient, in terms of the number of communicated messages, by extending the geometric monitoring framework to utilize prediction models. We initially describe a number of local estimators (predictors) that are useful for the applications that we consider and which have already been shown particularly useful in past work. We then demonstrate the feasibility of incorporating predictors in the geometric monitoring framework and show that prediction-based geometric monitoring in fact generalizes the original geometric monitoring framework. We propose a large variety of different prediction-based monitoring models for the distributed threshold monitoring of complex functions. Our extensive experimentation with a variety of real data sets, functions and parameter settings indicates that our approaches can provide significant communication savings ranging between two times and up to three orders of magnitude, compared to the transmission cost of the original monitoring framework.