The space complexity of approximating the frequency moments
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Wavelet-based histograms for selectivity estimation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Wavelets for computer graphics: theory and applications
Wavelets for computer graphics: theory and applications
Tracking join and self-join sizes in limited storage
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximate computation of multidimensional aggregates of sparse data using wavelets
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Fast, small-space algorithms for approximate histogram maintenance
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Processing complex aggregate queries over data streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Dynamic multidimensional histograms
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Data streams: algorithms and applications
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
ProPolyne: A Fast Wavelet-Based Algorithm for Progressive Evaluation of Polynomial Range-Sum Queries
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Approximate Query Processing Using Wavelets
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Finding Frequent Items in Data Streams
ICALP '02 Proceedings of the 29th International Colloquium on Automata, Languages and Programming
What's hot and what's not: tracking most frequent items dynamically
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
One-Pass Wavelet Decompositions of Data Streams
IEEE Transactions on Knowledge and Data Engineering
Extended wavelets for multiple measures
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Deterministic wavelet thresholding for maximum-error metrics
PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
SHIFT-SPLIT: I/O efficient maintenance of wavelet-transformed multidimensional data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Wavelet synopsis for data streams: minimizing non-euclidean error
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
One-pass wavelet synopses for maximum-error metrics
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Approximate frequency counts over data streams
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
How to summarize the universe: dynamic maintenance of quantiles
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Distributed sparse random projections for refinable approximation
Proceedings of the 6th international conference on Information processing in sensor networks
Exploiting duality in summarization with deterministic guarantees
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Hierarchical synopses with optimal error guarantees
ACM Transactions on Database Systems (TODS)
Finding Frequent Items in a Turnstile Data Stream
COCOON '08 Proceedings of the 14th annual international conference on Computing and Combinatorics
Proceedings of the VLDB Endowment
PROUD: a probabilistic approach to processing similarity queries over uncertain data streams
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Multiplicative synopses for relative-error metrics
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Hierarchically compressed wavelet synopses
The VLDB Journal — The International Journal on Very Large Data Bases
AMID: Approximation of MultI-measured Data using SVD
Information Sciences: an International Journal
Approximating sliding windows by cyclic tree-like histograms for efficient range queries
Data & Knowledge Engineering
Building wavelet histograms on large data in MapReduce
Proceedings of the VLDB Endowment
Constructing optimal wavelet synopses
EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
Sketch-based geometric monitoring of distributed stream queries
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Recent years have seen growing interest in effective algorithms for summarizing and querying massive, high-speed data streams. Randomized sketch synopses provide accurate approximations for general-purpose summaries of the streaming data distribution (e.g., wavelets). The focus of existing work has typically been on minimizing space requirements of the maintained synopsis — however, to effectively support high-speed data-stream analysis, a crucial practical requirement is to also optimize: (1) the update time for incorporating a streaming data element in the sketch, and (2) the query time for producing an approximate summary (e.g., the top wavelet coefficients) from the sketch. Such time costs must be small enough to cope with rapid stream-arrival rates and the real-time querying requirements of typical streaming applications (e.g., ISP network monitoring). With cheap and plentiful memory, space is often only a secondary concern after query/update time costs. In this paper, we propose the first fast solution to the problem of tracking wavelet representations of one-dimensional and multi-dimensional data streams, based on a novel stream synopsis, the Group-Count Sketch (GCS). By imposing a hierarchical structure of groups over the data and applying the GCS, our algorithms can quickly recover the most important wavelet coefficients with guaranteed accuracy. A tradeoff between query time and update time is established, by varying the hierarchical structure of groups, allowing the right balance to be found for specific data stream. Experimental analysis confirms this tradeoff, and shows that all our methods significantly outperform previously known methods in terms of both update time and query time, while maintaining a high level of accuracy.