Ratio threshold queries over distributed data sources

Authors:
Rajeev Gupta;Krithi Ramamritham;Mukesh Mohania
Affiliations:
IBM Research, India;Indian Institute of Technology, Mumbai, India;IBM Research, India
Venue:
Proceedings of the VLDB Endowment
Year:
2013

Citing 8
Cited 0

Distributed top-k monitoring

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adaptive filters for continuous queries over distributed data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Executing incoherency bounded continuous queries at web data aggregators

WWW '05 Proceedings of the 14th international conference on World Wide Web
Communication-efficient distributed monitoring of thresholded counts

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Optimized query planning of continuous aggregation queries in dynamic data dissemination networks

Proceedings of the 16th international conference on World Wide Web
A geometric approach to monitoring threshold functions over distributed data streams

ACM Transactions on Database Systems (TODS)
Stochastic consistency, and scalable pull-based caching for erratic data stream sources

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Filtering Data Streams for Entity-Based Continuous Queries

IEEE Transactions on Knowledge and Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Continuous aggregation queries over dynamic data are used for real time decision making and timely business intelligence. In this paper we consider queries where a client wants to be notified if the ratio of two aggregates over distributed data crosses a specified threshold. Consider these scenarios: a mechanism designed to defend against distributed denial of service attacks may be triggered when the fraction of packets arriving to a subnet is more than 5% of the total packets; or a distributed store chain withdraws its discount on luxury goods when sales of luxury goods constitute more than 20% of the overall sales. The challenge in executing such ratio threshold queries (RTQs) lies in incurring the minimal amount of communication necessary for propagation of updates from data sources to the aggregator node where the client query is executed. We address this challenge by proposing schemes for converting the client ratio threshold condition into conditions on individual distributed data sources. Whenever the condition associated with a source is violated, the source pushes its data values to the aggregator, which in turn pulls data values from other sources to determine whether the client threshold condition is indeed violated. We present algorithms to minimize the number of source condition violations (i.e., the number of pushes) while ensuring that no violation of the client threshold condition is missed. Further, in case of a source condition violation, we propose efficient selective pulling algorithms for intelligently choosing additional sources whose data should be pulled by the aggregator. Using performance evaluation on synthetic and real traces of data updates we show that our algorithms result in up to an order of magnitude less number of messages compared to existing approaches in the literature.