Randomized algorithms
The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Space-efficient online computation of quantile summaries
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Maintaining stream statistics over sliding windows: (extended abstract)
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Processing complex aggregate queries over data streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Dynamic multidimensional histograms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Time Series Analysis, Forecasting and Control
Time Series Analysis, Forecasting and Control
Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries
Proceedings of the 27th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Processing set expressions over continuous update streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
The design of an acquisitional query processor for sensor networks
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adaptive filters for continuous queries over distributed data streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Gigascope: a stream database for network applications
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adaptive stream resource management using Kalman Filters
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Finding (Recently) Frequent Items in Distributed Data Streams
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Power-conserving computation of order-statistics over sensor networks
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Holistic aggregates in a networked world: distributed tracking of approximate quantiles
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Sketching streams through the net: distributed approximate query tracking
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Approximate Data Collection in Sensor Networks using Probabilistic Models
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Communication-efficient distributed monitoring of thresholded counts
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A geometric approach to monitoring threshold functions over distributed data streams
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Communication-Efficient Tracking of Distributed Cumulative Triggers
ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Distributed set-expression cardinality estimation
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Model-driven data acquisition in sensor networks
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
PAQ: time series forecasting for approximate query answering in sensor networks
EWSN'06 Proceedings of the Third European conference on Wireless Sensor Networks
Summarizing Distributed Data Streams for Storage in Data Warehouses
DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery
Regression on evolving multi-relational data streams
Proceedings of the 2011 Joint EDBT/ICDT Ph.D. Workshop
Prediction-based geometric monitoring over distributed data streams
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Sketch-based querying of distributed sliding-window data streams
Proceedings of the VLDB Endowment
Sketch-based geometric monitoring of distributed stream queries
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
While traditional database systems optimize for performance on one-shot query processing, emerging large-scale monitoring applications require continuous tracking of complex data-analysis queries over collections of physically distributed streams. Thus, effective solutions have to be simultaneously space/time efficient (at each remote monitor site), communication efficient (across the underlying communication network), and provide continuous, guaranteed-quality approximate query answers. In this paper, we propose novel algorithmic solutions for the problem of continuously tracking a broad class of complex aggregate queries in such a distributed-streams setting. Our tracking schemes maintain approximate query answers with provable error guarantees, while simultaneously optimizing the storage space and processing time at each remote site, and the communication cost across the network. In a nutshell, our algorithms rely on tracking general-purpose randomized sketch summaries of local streams at remote sites along with concise prediction models of local site behavior in order to produce highly communication- and space/time-efficient solutions. The end result is a powerful approximate query tracking framework that readily incorporates several complex analysis queries (including distributed join and multi-join aggregates, and approximate wavelet representations), thus giving the first known low-overhead tracking solution for such queries in the distributed-streams model. Experiments with real data validate our approach, revealing significant savings over naive solutions as well as our analytical worst-case guarantees.