Continuous queries over append-only databases
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Continuously adaptive continuous queries over streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Continuous queries over data streams
ACM SIGMOD Record
Continual Queries for Internet Scale Event-Driven Information Delivery
IEEE Transactions on Knowledge and Data Engineering
Online Data Mining for Co-Evolving Time Sequences
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Fjording the Stream: An Architecture for Queries Over Streaming Sensor Data
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Finding (Recently) Frequent Items in Distributed Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Distributed Data Streams Indexing using Content-Based Routing Paradigm
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Holistic aggregates in a networked world: distributed tracking of approximate quantiles
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Monitoring streams: a new class of data management applications
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
StatStream: statistical monitoring of thousands of data streams in real time
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
IEEE Journal on Selected Areas in Communications
Hi-index | 0.00 |
Monitoring data streams in a distributed system is the focus of much research in recent years. Most of the proposed schemes, however, deal with monitoring simple aggregated values, such as the frequency of appearance of items in the streams. More involved challenges, such as the important task of feature selection (e.g., by monitoring the information gain of various features), still require very high communication overhead using naive, centralized algorithms. We present a novel geometric approach by which an arbitrary global monitoring task can be split into a set of constraints applied locally on each of the streams. The constraints are used to locally filter out data increments that do not affect the monitoring outcome, thus avoiding unnecessary communication. As a result, our approach enables monitoring of arbitrary threshold functions over distributed data streams in an efficient manner. We present experimental results on real-world data which demonstrate that our algorithms are highly scalable, and considerably reduce communication load in comparison to centralized algorithms.